Sale: Use codesave50for 50% off
Back to Blog

How to Fix Slow Redis Queries

JayJay

Redis is the easiest database in the world to make fast and the easiest to make catastrophically slow. The reason is the same in both cases: it runs commands on a single thread. If your commands are O(1), Redis serves a million operations per second on a laptop. If one client runs KEYS * on a million-key database, every other client waits, and your monitoring graphs go from green to red in seconds.

That single-threaded design changes how you think about performance. Postgres slowness is a planner problem. MongoDB slowness is a working-set problem. Redis slowness is almost always one of three things: a blocking O(N) command on a big collection, memory pressure forcing evictions, or network round-trips that should have been pipelined. This guide walks through how to diagnose each and the configuration that prevents the worst of them.

Why Redis isn't slow the way SQL is slow

Redis has no query planner, no statistics, no JIT, no shared buffers. There's nothing to tune in the way you'd tune Postgres. What you have instead is:

  1. One main thread that runs every command. Commands have documented time complexity. An O(N) command on a million-element collection blocks the server for the duration. That blocking time is the real bottleneck.

  2. An in-memory dataset. Once your data fits in RAM, reads and writes are nanoseconds. Once it doesn't, you're either evicting hot keys or swapping, both of which are catastrophic. Memory is the limit; CPU rarely is.

  3. Synchronous client-server protocol. Each command without pipelining is a round-trip. Latency to Redis is dominated by network latency, not Redis itself. A loop of GET calls on AWS cross-AZ has a noise floor of ~0.5ms per command.

So when someone says "Redis is slow," they almost always mean one of:

  • A specific O(N) command on a specific key is blocking the main thread.
  • The working set exceeds maxmemory and the eviction policy is causing churn.
  • The application is making too many round-trips when it could batch.
  • The server is doing background work (RDB snapshot, AOF rewrite, expiry cycles) that's stalling the event loop.

Each has different diagnostics and different fixes.

Step 1: The diagnostic stack

Redis gives you everything you need out of the box. There are no extensions to install.

SLOWLOG

SLOWLOG records commands that exceeded a configurable threshold, in microseconds. The defaults are too lax. Lower the threshold to catch more:

CONFIG SET slowlog-log-slower-than 1000
CONFIG SET slowlog-max-len 256

1000 microseconds is 1ms. On a Redis instance where every operation is supposed to be sub-millisecond, anything above 1ms is a candidate for investigation.

For persistent settings, put them in redis.conf:

slowlog-log-slower-than 1000
slowlog-max-len 256

Read the log:

SLOWLOG GET 10
1) 1) (integer) 14
   2) (integer) 1716919200
   3) (integer) 6234
   4) 1) "HGETALL"
      2) "user:12345:metadata"
   5) "10.0.1.5:54321"
   6) "app-1"

The entries are id, unix timestamp, duration in microseconds, command and args, client address, and client name. The 6234-microsecond HGETALL on user:12345:metadata tells you the hash is bigger than expected.

Reset when you want a fresh window:

SLOWLOG RESET

SLOWLOG only catches commands above the threshold and after they run. It doesn't tell you about blocking events caused by RDB forks, expiry storms, or AOF rewrites. For those, use LATENCY.

LATENCY monitoring

Where SLOWLOG records command duration, LATENCY records every event longer than a threshold, including internal events that aren't tied to a single command.

Enable it:

CONFIG SET latency-monitor-threshold 100

This records any event longer than 100ms. Lower for stricter SLAs.

LATENCY DOCTOR produces a human-readable diagnostic:

LATENCY DOCTOR
Dave, I have observed latency spikes in this Redis instance.
You can try to identify the cause by analyzing the spikes:

1. event 'fork': 5 latency spikes (average 234ms, mean deviation 45ms,
   period 3600.00 sec). Max latency: 312ms.

This is the place to start any latency investigation. LATENCY DOCTOR lists the events that have been slow, the typical duration, and concrete suggestions ("consider disabling RDB snapshots if you're using AOF, the fork is what's causing the spikes").

LATENCY HISTORY <event> gives you the time series for a specific event class:

LATENCY HISTORY fork

The event types worth knowing:

  • fork: the cost of forking to write an RDB snapshot or AOF rewrite. On large datasets, this can be hundreds of milliseconds, all stalled on the main thread.
  • aof-rewrite-diff-write: time spent merging AOF buffer into the rewritten file.
  • expire-cycle: time spent expiring keys. On bursts of expiries (TTL storm), this can spike.
  • event-loop: total event loop latency. A useful overall health metric.

INFO and INFO commandstats

INFO is the catch-all server status. Worth knowing what's in each section. Performance-relevant ones:

INFO memory
INFO stats
INFO commandstats
INFO clients

INFO memory shows used_memory_human, used_memory_peak_human, and mem_fragmentation_ratio. A fragmentation ratio above 1.5 means a lot of memory is being held but unused; consider enabling active defragmentation.

INFO commandstats is the underused gem. It shows every command, the number of calls, and the average microseconds per call:

cmdstat_get:calls=1234567,usec=2345678,usec_per_call=1.90
cmdstat_hgetall:calls=8923,usec=45678901,usec_per_call=5118.5

That HGETALL averaging 5ms tells you which hash is too big. The total useful metric is usec_per_call; sort by that descending and you get your hot path's worst commands.

MONITOR (handle with care)

MONITOR streams every command the server processes to the connected client. It's the only way to see exactly what your application is doing in real time. It's also crushingly expensive:

MONITOR

On a busy server, MONITOR can cut throughput in half. The client also can't keep up with the firehose. Never leave it running. Use it for 30-second bursts on a non-production replica or during a maintenance window.

A safer alternative for "what's running right now": CLIENT LIST.

CLIENT LIST and CLIENT KILL

CLIENT LIST

Shows every connected client with the IP, command, idle time, and current state. Useful for finding the client that's running a blocking command:

CLIENT KILL ID 12345

Drops a specific client. The blocking command may finish before the kill takes effect, but the connection won't accept new commands.

redis-cli --bigkeys and --hotkeys

redis-cli --bigkeys samples the keyspace and reports the biggest key in each data type:

BASH
redis-cli --bigkeys

# Biggest hash found 'user:12345:sessions' has 458732 fields
# Biggest list  found 'queue:pending' has 8921734 elements

It's a sampling tool, not exhaustive, but it surfaces obvious problems fast. Run during a quiet period on a replica.

redis-cli --hotkeys finds frequently-accessed keys, but only if maxmemory-policy is set to one of the LFU variants. Worth flipping the policy temporarily if you suspect a hot-key problem.

There's also redis-cli --memkeys and redis-cli --latency for memory-by-pattern and end-to-end latency from the client.

RedisInsight

RedisInsight is the official GUI client from Redis. The features worth opening it for:

  • The slow log viewer with command-level breakdown.
  • The memory analyzer that scans key patterns and shows distribution.
  • The profiler view that wraps MONITOR with sampling.

For self-hosted Redis monitoring, the redis_exporter Prometheus exporter plus Grafana is the standard production stack. The "Redis Dashboard" community dashboards on grafana.com are a fine starting point.

rdb-tools for offline analysis

For a deeper dive than --bigkeys can give, use the rdb-tools Python package on an RDB snapshot:

BASH
rdb -c memory dump.rdb > memory.csv

You get a CSV of every key with its data type, encoding, and memory usage. Sort and analyze without affecting production. Invaluable for one-time audits.

Step 2: Understand command complexity

Redis has no query planner. Instead, every command's time complexity is documented in the Redis docs. Reasoning about complexity replaces EXPLAIN.

ComplexityWhat it meansExamples
O(1)Constant timeGET, SET, INCR, HGET, EXISTS, LPUSH, RPUSH, SADD, SISMEMBER (since 6.2 for sets)
O(log N)LogarithmicZADD, ZRANGE (small ranges), ZRANK, ZADD
O(N) on small NBounded linearLPOP COUNT n, SMEMBERS on small set, HGETALL on small hash
O(N) on big NUnbounded linearLRANGE 0 -1 on big list, KEYS *, SUNION on big sets
O(N+M)Linear in input plus outputSORT, large ZRANGEBYSCORE

The danger is the O(N) on unbounded N. Those commands are fast in development on small datasets and disastrous in production at scale, because Redis is single-threaded: while iterating a million-element set, no other client can do anything. Every Redis postmortem includes at least one O(N) command that someone didn't realize was O(N).

The commands to be careful about

A non-exhaustive list of high-risk commands:

  • KEYS pattern: O(N) over the entire keyspace. Never use in production.
  • LRANGE list 0 -1: O(N) over the list. Never on unbounded lists.
  • SMEMBERS bigset: O(N). Use SSCAN.
  • HGETALL bighash: O(N). Use HSCAN or design smaller hashes.
  • SUNION, SINTER, SDIFF on large sets: O(N+M).
  • DEBUG SLEEP n: literally sleeps the main thread. Useful for chaos engineering, dangerous in production.
  • FLUSHALL, FLUSHDB: O(N) over all keys, blocks the server. Use FLUSHALL ASYNC in Redis 4+.
  • SAVE: synchronous RDB save. Blocks the server. Use BGSAVE.

The pattern: any command that touches every element of a collection is dangerous on a big collection. The defense is bounded reads (paginate, SCAN, range-limited operations) or smaller collections by design.

Check key size before reading

Before reading a collection in full, check its size:

LLEN big-list
SCARD big-set
HLEN big-hash
ZCARD big-zset
STRLEN big-string

Any of these returning a number in the hundreds of thousands should make you stop and reconsider.

MEMORY USAGE for byte-level inspection

MEMORY USAGE user:12345:sessions

Returns the bytes used by a single key. Useful when LLEN is bounded but per-element size is large (a list of giant JSON strings, for example).

Step 3: The big-key problem

The single most common Redis pathology. One huge key (a 50MB string, a list with a million entries, a hash with 500,000 fields) becomes a permanent performance liability:

  • Reads of the full key are slow O(N) operations.
  • Replication of the key blocks the primary's main thread.
  • Snapshot saves require forking with that key in memory.
  • Eviction is delayed because the key is "still in use."
  • Backups are slow.

Find them with redis-cli --bigkeys for a quick scan, or rdb-tools for an exhaustive audit. Then refactor.

Patterns that produce big keys

The usual culprits:

  • Per-user sets that grow unboundedly. A set of every product a user has viewed.
  • Hash-as-database. A single hash with hundreds of thousands of field-value pairs because someone "didn't want to manage TTLs on individual keys."
  • List-as-log. A list that accumulates events without being trimmed.
  • Sorted-set-as-feed. A sorted set that records every action without retention.

Fixes

The fix is usually decomposition:

  • Shard by hash. Instead of user:12345:viewed, use user:12345:viewed:{shard} where shard is hash(item_id) % 16. Each shard has 1/16 the entries.
  • Bound and trim. For lists and sorted sets, use LTRIM and ZREMRANGEBYRANK to enforce a max size:
    LPUSH events:user:12345 "..."
    LTRIM events:user:12345 0 999  # keep most recent 1000
    
  • Split by time. Instead of audit:user:12345 (all time), use audit:user:12345:2026-05 (per month). Old months drop off naturally.
  • Use HyperLogLog for cardinality. If you only need unique counts, replace the set with PFADD / PFCOUNT. Constant ~12KB regardless of cardinality.

When refactoring, the deletion of the old big key is itself dangerous. DEL big-key is O(N). Use UNLINK big-key instead, which schedules the deletion in a background thread (Redis 4+).

Step 4: The hot-key problem

A single key that handles a large fraction of traffic. Common examples:

  • A global config object loaded on every request.
  • A counter for a popular item.
  • A rate-limit bucket for a popular endpoint.
  • The current session for a celebrity user.

Because Redis is single-threaded, one busy key occupies the main thread to the exclusion of everything else. You can't scale a hot key by adding more Redis instances; the key lives on one shard.

Detect them with redis-cli --hotkeys (requires LFU eviction policy enabled). On Redis 6.2+, OBJECT FREQ keyname gives the access frequency:

CONFIG SET maxmemory-policy allkeys-lfu
OBJECT FREQ "global:config"

Fixes

Once identified, the standard fixes:

  • Cache in the application. Read the value from Redis once per process per N seconds, hold a local copy. Trades memory for round-trips.
  • Shard the key. For a counter, write to counter:{shard} and sum on read. For a single-value key, this doesn't help; only useful for accumulators.
  • Use Redis-side scripts. Combine read-modify-write into a single atomic Lua call instead of round-tripping. Reduces command count, doesn't eliminate the hot spot.
  • Move to a different store. For genuinely contended counters, consider an in-memory atomic in your application or a CRDT-backed store.

Step 5: Pick the right data structure

In Redis, the data structure is the index. A query that's O(N) with one structure is often O(1) with another. Choose deliberately.

Strings for cached objects

For a simple cache of an opaque blob (serialized JSON, a session payload), a string is the right tool:

SET session:abc123 "{...}"  EX 3600
GET session:abc123

For structured data where you read individual fields, prefer a hash.

Hashes for objects with field access

Storing a user object as a hash is more compact than separate keys, and lets you read individual fields:

HSET user:12345 name "Jane" email "jane@example.com" plan "pro"
HGET user:12345 plan
HMGET user:12345 name email

For small hashes (under ~64 fields by default, configurable via hash-max-listpack-entries), Redis stores them as a packed listpack, which is dramatically smaller than the separate-keys approach. This is one of the big memory wins available to you.

The trap: a hash with thousands of fields ceases to be compact, and HGETALL becomes O(N). Don't use hashes as an unbounded key-value store.

Sorted sets for time-ordered data

If you need "the 20 most recent events for this user," don't use a list with LRANGE 0 19. Use a sorted set keyed by timestamp:

ZADD events:user:12345 1716919200 "event-1"
ZADD events:user:12345 1716919260 "event-2"

ZREVRANGE events:user:12345 0 19 WITHSCORES

Insertion is O(log N), range read is O(log N + M) where M is the result size. Compare to a list, where insertion at position is fine but reading the last 20 with a million-element list is O(N) for the seek.

Trim the sorted set to bound it:

ZREMRANGEBYRANK events:user:12345 0 -1001  # keep most recent 1000

Sets for membership

Checking "is X in this collection" with a list is O(N). With a set, it's O(1):

SADD active:users user:12345
SISMEMBER active:users user:12345

Set operations (SUNION, SINTER, SDIFF) are useful for "users who like both A and B"-style queries. They're O(N+M); efficient on small inputs, dangerous on large ones.

HyperLogLog for cardinality

Counting unique values without storing every one:

PFADD daily-visitors user-1 user-2 user-3 user-4
PFCOUNT daily-visitors

~0.8% error, 12KB max per key, no matter how many uniques you add. For "how many unique visitors today?" use cases, this saves orders of magnitude of memory compared to a set.

Streams for append-only logs

Since Redis 5.0, streams are a first-class data type for append-only logs with consumer groups:

XADD events:orders * type "order_placed" id "12345"
XREAD COUNT 100 STREAMS events:orders 0

Streams handle the "many producers, many consumers, with offsets and acknowledgment" case much better than lists. They have a configurable max length (XADD events:orders MAXLEN ~ 10000 *), built-in consumer group semantics, and replay support.

For workloads that look like Kafka but smaller, Streams beat both list-based queues and pub/sub.

Geospatial commands

For "find things near a location," GEOADD and GEOSEARCH use a sorted set under the hood with geohash encoding. Don't roll your own:

GEOADD shops 13.361389 38.115556 "Shop A"
GEOSEARCH shops FROMLONLAT 13.5 38.0 BYRADIUS 100 km ASC COUNT 10

GEOSEARCH replaced the old GEORADIUS family in 6.2. Use the new commands; they're more flexible and same speed.

Bitfields for bit-packed data

For dense numeric data (counters, flags per user, daily activity tracking), SETBIT and BITCOUNT are a memory-tiny way to store millions of booleans:

SETBIT user:12345:active:2026-05-28 1 1
BITCOUNT user:12345:active:2026-05-28

One byte per 8 users per day instead of one key per user per day. For analytics use cases, this is transformative.

Probabilistic structures via RedisBloom

The RedisBloom module adds Bloom filters, count-min sketch, and top-k. For "have I seen this URL before?" workloads where false positives are tolerable but false negatives aren't, a Bloom filter uses a fraction of the memory of a set.

If you're on Redis Stack (or the open-source Bloom module), this is the right tool for cardinality-bounded membership.

Step 6: Rewrite the access patterns

Most slow Redis workloads aren't slow because of any one command. They're slow because of patterns that pile up cost.

Never KEYS in production

KEYS * is O(N) over the entire keyspace, blocks the server, and is one of the most common causes of Redis outages. Use SCAN instead:

# Blocks the server, possibly for seconds
KEYS user:*

# Iterates in chunks, never blocking
SCAN 0 MATCH user:* COUNT 100

SCAN returns a cursor and a batch. Loop until the cursor returns to 0. The contract is "every key that existed at the start and end of the scan will be returned at least once," with possible duplicates for keys modified during the scan.

The same pattern applies to HSCAN, SSCAN, and ZSCAN for big collections. If you're going to inspect them, do it in chunks.

COUNT 100 is a hint; the server may return more or fewer. For matching scans (MATCH user:*), the server applies the filter after retrieving the batch, so you might get empty batches for sparse matches. Don't be confused by them.

Pipeline round-trips

Each command without pipelining is a network round-trip. For 100 commands at 1ms RTT, that's 100ms minimum, regardless of how fast Redis is. Pipelining batches them into a single round-trip:

PY
# Slow: 100 round-trips
for key in keys:
    value = r.get(key)

# Fast: one round-trip
pipe = r.pipeline()
for key in keys:
    pipe.get(key)
values = pipe.execute()

MGET does the same for GET specifically. Prefer it when you have many keys:

MGET key1 key2 key3 key4 key5

For sets, SADD accepts multiple values at once:

SADD myset a b c d e f g

Most commands have a "multi-value" form. Use them.

Cluster-aware pipelining

In a Redis Cluster, keys live on different shards based on their hash slot. A pipeline that mixes keys from multiple slots fails or fans out depending on the client. To pipeline reliably, ensure all keys in a pipeline are on the same slot via hash tags:

MGET user:{12345}:name user:{12345}:email user:{12345}:plan

The {12345} curly braces tell Redis Cluster to hash only that portion, so all three keys end up on the same slot. This is the standard way to keep related keys co-located in cluster setups.

Lua scripts for atomic read-modify-write

A client that does GET, modifies, then SET has both a race condition and three round-trips. A Lua script runs atomically on the server in one round-trip:

LUA
local current = redis.call("GET", KEYS[1])
local updated = tonumber(current) + tonumber(ARGV[1])
redis.call("SET", KEYS[1], updated)
return updated

Use EVAL for one-offs, or SCRIPT LOAD plus EVALSHA for hot scripts (the script gets cached on the server by SHA). Most Redis clients handle this automatically.

Lua scripts run on the main thread, so don't write long-running scripts. The 5-second limit is a hard cap; the practical limit is much lower if you don't want to stall other clients.

Functions in Redis 7+

Redis 7 introduced Functions, which are like Lua scripts but explicitly registered, versioned, and persisted with the dataset. For scripts you use repeatedly, Functions are the preferred mechanism going forward. Lua scripts via EVAL still work; Functions are easier to manage.

TTL storms

Setting 100,000 keys to expire at exactly the same instant means Redis spends a chunk of its event loop on expiration the moment the deadline hits. The active expire cycle catches up, but you'll see latency spikes in LATENCY HISTORY expire-cycle.

Spread expiries with random jitter:

PY
ttl = 3600 + random.randint(0, 300)
r.set(key, value, ex=ttl)

5% jitter is enough to break up the storm.

Avoid full-collection reads on big keys

# Disastrous on a million-element list
LRANGE big-list 0 -1
SMEMBERS big-set
HGETALL big-hash

Paginate or scan:

LRANGE big-list 0 99
SSCAN big-set 0 COUNT 100
HSCAN big-hash 0 COUNT 100

This is the #1 production pattern that breaks Redis. Code review for these.

Step 7: Configuration that prevents outages

Most "Redis is slow" reports come down to memory pressure or the wrong eviction policy. The defaults are conservative for safety, not performance.

maxmemory and eviction policy

Without maxmemory, Redis grows until the kernel kills it. Always set a limit:

maxmemory 8gb
maxmemory-policy allkeys-lru

Eviction policies in plain English:

PolicyBehavior
noevictionErrors on writes when memory is full. Safe for caches where you've sized to fit.
allkeys-lruEvicts least-recently-used keys regardless of TTL. The default for caches.
allkeys-lfuEvicts least-frequently-used. Better for skewed access patterns (a few hot keys, many cold).
volatile-lruEvicts LRU only among keys with a TTL. Use when you have a mix of durable and cache data.
volatile-lfuSame as above, with LFU.
volatile-ttlEvicts the soonest-to-expire key.
allkeys-randomRandom eviction. Rarely the right choice.
volatile-randomSame, scoped to keys with TTL.

The choice that catches most teams off guard: volatile-lru requires you to set TTLs on the keys you want to be eligible for eviction. If you forget, the cache fills with non-TTL keys and writes start erroring with OOM command not allowed when used memory > 'maxmemory'.

For a cache-only Redis: allkeys-lru or allkeys-lfu. For a mixed durable/cache instance: volatile-lru with discipline about TTL-setting on the cache half. Better: separate Redis instances for cache and durable workloads. Mixing them is a perpetual operational headache.

Lazy freeing

Deleting a huge key with DEL is O(N) and blocks the server. Enable lazy freeing so the memory reclamation happens in a background thread:

lazyfree-lazy-eviction yes
lazyfree-lazy-expire yes
lazyfree-lazy-server-del yes
lazyfree-lazy-user-del yes

For application-initiated deletes of potentially-big keys, prefer UNLINK over DEL. UNLINK is always lazy.

Persistence: AOF vs RDB

The choice has steady-state and incident performance implications.

RDB: periodic snapshots via fork. The fork itself can stall the main thread on a large dataset because of copy-on-write page faults. Worst case during heavy writes: hundreds of milliseconds of stall. The save file is compact and good for backups.

AOF: append every write to a log. Continuous I/O cost. appendfsync everysec is the common setting (lose at most 1 second of writes on a power failure). appendfsync always flushes every command and is dramatically slower. AOF rewrites also fork, so the same caveat applies.

Both together: AOF is the source of truth for recovery, RDB for backup snapshots. Common setup, but you pay both costs.

Neither: for pure caches where the data can be regenerated. Disable both:

save ""
appendonly no

For durable workloads, AOF with appendfsync everysec is the typical compromise. Set aof-rewrite-incremental-fsync and rdb-save-incremental-fsync to avoid stalls on the fsync of the rewrite.

Tune fork behavior

For environments where fork stalls are a problem, two options:

  1. Disable transparent huge pages on the host. Same advice as Postgres:

    BASH
    echo madvise > /sys/kernel/mm/transparent_hugepage/enabled
    

    THP causes longer copy-on-write page faults during fork.

  2. Use diskless replication and BGSAVE less often. repl-diskless-sync yes lets full resyncs stream over the network without writing to disk first. Saves I/O at the cost of the network connection holding open longer.

I/O threads

Single-threaded command processing is Redis's defining feature, but Redis 6+ can run I/O (parsing and writing) in worker threads. On high-throughput servers with small commands and many connections:

io-threads 4
io-threads-do-reads yes

io-threads 4 is reasonable on an 8-core box. Going past that has diminishing returns. Don't enable I/O threads unless you've measured a benefit; for low-traffic instances they add overhead.

Active defragmentation

For long-running Redis instances with lots of write churn, memory fragmentation builds up. The mem_fragmentation_ratio in INFO memory is the key metric. Above 1.5, enable active defrag:

activedefrag yes
active-defrag-ignore-bytes 100mb
active-defrag-threshold-lower 10
active-defrag-threshold-upper 100
active-defrag-cycle-min 25
active-defrag-cycle-max 75

The defrag thread moves keys around in memory to coalesce free pages. It's a background operation but contends with the main thread. Tune the cycle settings to your tolerance; the defaults are conservative.

tcp-keepalive

tcp-keepalive 60

Sends a keepalive probe every 60 seconds on idle connections. Helps detect dead connections that the OS hasn't noticed. The default of 300 is fine for most setups; tighten if you're seeing stale connections build up.

Disable commands you don't want

Several commands are dangerous enough that you may want to disable them in production. CONFIG, DEBUG, FLUSHALL, FLUSHDB, KEYS, SHUTDOWN. Use rename-command to rename them to something unguessable, or to empty (effectively disabling):

rename-command KEYS ""
rename-command FLUSHALL ""
rename-command CONFIG "CONFIG_a1b2c3d4"

This is a defense against bad code or accidental commands, not security. Better: separate admin and application credentials with ACLs (Redis 6+).

Step 8: Common slow query patterns

Big keys (already covered)

The single most common production pathology. Find with --bigkeys, fix by decomposing.

Hot keys (already covered)

A single key handling a large fraction of traffic. Fix by sharding the key (for counters), application-side caching, or moving off Redis.

KEYS in production

It keeps happening. Every Redis postmortem has at least one. Even when you've told the team not to. Code review for it explicitly.

Big values inside MULTI/EXEC

Transactions queue commands and run them atomically. If you queue an LRANGE big-list 0 -1 inside MULTI, the entire transaction blocks the server for the duration of that read. Watch for transactions that include any O(N) command.

Slow Lua scripts

Lua runs on the main thread. A script that does 100,000 operations stalls the server for as long as it takes. Keep scripts bounded; use SCRIPT KILL to abort a runaway script (only works on non-write scripts).

Slow client

A client that doesn't read fast enough can fill the server's output buffer and either get disconnected or back up the server. Check INFO clients for output_list_length > 0. The fix is on the client side: read faster, or raise client-output-buffer-limit for that client type (cautiously).

Pub/Sub fan-out

Publishing one message to 10,000 subscribers means 10,000 writes on the publisher's main thread. For large fan-out, prefer Streams with consumer groups, which let consumers pull at their own pace.

Replication-induced stalls

A new replica triggers a full sync, which forks the master. On a large dataset, the fork itself can take seconds. Check LATENCY HISTORY fork and schedule replica bootstraps off-peak. Diskless replication helps; disabling RDB entirely helps more.

Cluster resharding

Moving slots between nodes during a resharding operation generates load on both source and destination. Schedule during low-traffic windows. Use redis-cli --cluster reshard with --cluster-yes only if you've planned the slot movement.

Cross-AZ network latency

The most under-recognized "slow Redis" problem. Application in us-east-1a, Redis in us-east-1b: 1-2ms RTT minimum. 1000 serial commands becomes 1-2 seconds purely from network. Pipeline, or co-locate. There's no Redis-side fix for application-side network round-trips.

Step 9: When Redis isn't enough

Sometimes the workload needs more than one Redis can give.

Redis Cluster

For sharding across multiple primaries, Redis Cluster is the built-in answer. It shards keys across nodes by hash slot. The trade-offs:

  • Multi-key operations only work for keys on the same slot (use hash tags).
  • Some commands aren't supported in cluster mode.
  • Failover is automatic but takes seconds.

The "one big Redis vs Redis Cluster" decision is mostly about data size and write throughput. If your working set fits on one machine, one Redis is simpler.

Replication for read scaling

For read-heavy workloads, a primary plus N read replicas spreads load. Set replica-read-only yes (the default) and route reads to replicas. Replication is asynchronous, so reads may see slightly stale data; size your tolerance.

Persistence-only replicas

A pattern worth knowing: run a primary with persistence off (no RDB, no AOF) for max performance, and a replica with persistence on. The replica handles durability, the primary handles speed. If the primary crashes, fail over to the replica.

KeyDB and Dragonfly as alternatives

For workloads that genuinely push Redis past its single-threaded limit, two alternatives are worth evaluating:

  • KeyDB is a multi-threaded fork of Redis that maintains protocol compatibility. Worth a look for high-throughput single-instance workloads.
  • Dragonfly is a from-scratch reimplementation with multi-threaded design, claiming 25x performance improvements on some benchmarks. More disruptive to drop in, but interesting.

Most teams don't need either. The right answer is usually "fix the access patterns and use one Redis instance per workload type."

The tools worth installing

ToolWhat it doesCost
SLOWLOG, LATENCYBuilt-in slow op and event trackingFree, built in
redis-cli --bigkeysSample biggest keys per typeFree, built in
redis-cli --hotkeysFind frequently-accessed keysFree, built in (LFU required)
redis-cli --latencyEnd-to-end latency from the clientFree, built in
RedisInsightOfficial GUI with slow-log and memory analyzerFree
rdb-toolsOffline RDB analysisFree
redis_exporter + GrafanaMetrics for production monitoringFree
RedisBloom, RedisJSON, RediSearchModule extensions for specific use casesFree (part of Redis Stack)
Memtier benchmarkLoad testingFree
redis-shakeMigration and sync toolFree

For most teams, the right starting set is: SLOWLOG and LATENCY thresholds set sanely, redis_exporter feeding Grafana, and RedisInsight for occasional manual investigation. That gets you 95% of what you need.

Quick checklist

When you encounter slow Redis:

  1. Check SLOWLOG for individual commands above your threshold.
  2. Run LATENCY DOCTOR to surface server-level stalls (forks, evictions, AOF, expire cycles).
  3. Check INFO commandstats for hot commands with high usec_per_call.
  4. Run redis-cli --bigkeys on a replica and look for outliers.
  5. Check INFO memory for mem_fragmentation_ratio and used_memory_peak.
  6. Check INFO clients for output_list_length (slow consumer) and connection count.
  7. Verify maxmemory and eviction policy are set sanely for your workload.
  8. Audit application for KEYS, MONITOR, HGETALL on big hashes, LRANGE 0 -1. These are the patterns that bring Redis down.
  9. Pipeline or use multi-value commands where you have many small operations.

Most slow Redis workloads come down to a single mistake on a single key. Find the command in SLOWLOG, identify the key, check its size, and pick a data structure or access pattern that turns the operation from O(N) into O(1) or O(log N). The vast majority of Redis performance work is structural, not configurational.

Keep Reading