AWS Redis (ElastiCache): Setup and Best Practices
Amazon ElastiCache for Redis is AWS's managed Redis service. It handles provisioning, patching, replication, and failover so you don't have to run Redis on EC2 yourself. You get a Redis endpoint and AWS takes care of the infrastructure underneath.
The main reason it exists: running Redis yourself on EC2 means you're responsible for OS updates, replication setup, failure detection, and node replacement. ElastiCache offloads all of that.
Two deployment modes
ElastiCache offers two ways to run Redis: Serverless and a self-managed cluster.
ElastiCache Serverless
Serverless is the simpler option. You create a cache and AWS handles scaling, replication, and capacity automatically. There are no nodes to size, no shards to configure, and no cluster topology to manage.
You pay for what you use: data stored (per GB-hour) and compute (per ECU-hour, where one ECU is roughly 1 vCPU and 1 GB RAM). It scales up and down without intervention.
Serverless works well for variable workloads, development environments, or cases where you want minimal configuration. The trade-off is less control and potentially higher costs at consistent high load compared to right-sized provisioned nodes.
Self-managed cluster
The traditional mode gives you control over node type, shard count, replica count, and cluster topology. You pick from the cache.r7g, cache.m7g, and cache.t4g instance families. Larger nodes mean more memory and more throughput.
This mode is better for predictable production workloads where you want to optimize cost, or when you need specific configurations like particular instance sizes or reserved instance pricing.
Cluster mode disabled vs. cluster mode enabled
Within the self-managed path, you choose between two cluster configurations.
Cluster mode disabled gives you a single shard with one primary node and up to 5 read replicas. All writes go to the primary, all reads can be distributed across replicas. This is the simpler option: a single endpoint for writes, an optional reader endpoint for reads.
Cluster mode enabled partitions your keyspace across multiple shards (up to 500). Each shard has its own primary and replicas. This lets you scale beyond the memory and throughput limits of a single node.
The trade-off with cluster mode is operational complexity. Multi-key commands (MGET, MSET) and transactions only work when all keys map to the same shard. You can use hash tags ({user}.session and {user}.profile always land on the same shard) to work around this, but it requires thought upfront.
For most applications, cluster mode disabled is enough. Use cluster mode when your dataset exceeds the memory of your largest available node (currently around 400 GB for cache.r7g.12xlarge).
| Cluster mode disabled | Cluster mode enabled | |
|---|---|---|
| Shards | 1 | 1-500 |
| Max memory | Single node limit | Up to 500x node limit |
| Multi-key commands | Full support | Same shard only |
| Online resharding | No | Yes |
| Failover | Automatic (replica promotes) | Automatic per shard |
Creating a cluster
Via the AWS console
- Go to ElastiCache in the AWS console and click Create cluster
- Select Redis OSS (or Valkey if you prefer the open-source fork)
- Choose Serverless or Design your own cache
- Set a name, description, and Redis version
- Choose your node type and number of replicas
- Select your VPC and subnets
- Assign a security group
- Configure backups, maintenance window, and encryption settings
- Click Create
Via the AWS CLI
This creates a primary with one replica, encryption at rest and in transit, and automatic failover enabled.
To create the subnet group first (required if you don't have one):
VPC and security group configuration
ElastiCache nodes are not publicly accessible. They must run inside a VPC and can only be reached by resources in the same VPC (or connected VPCs via peering or Transit Gateway).
Your security group needs an inbound rule allowing TCP on port 6379 from your application's security group (not from 0.0.0.0/0). The recommended approach:
Inbound rule:
Type: Custom TCP
Port: 6379
Source: <your app security group ID>
Referencing the app security group ID instead of a CIDR range means you don't need to update the rule when your application scales or its IP changes.
For the subnet group, include subnets from at least two availability zones. ElastiCache will place replicas across AZs automatically when Multi-AZ is enabled.
Connecting to your cache
ElastiCache provides a primary endpoint for writes and (when replicas exist) a reader endpoint for reads.
Example endpoints:
Primary: my-redis-cluster.abc123.ng.0001.use1.cache.amazonaws.com:6379
Reader: my-redis-cluster.abc123.ng.0001.use1.cache.amazonaws.com:6379 (reader DNS)
Connecting from Node.js
Connecting from Python
For cluster mode enabled, use a cluster-aware client:
Multi-AZ and automatic failover
Enable --automatic-failover-enabled (and --multi-az-enabled) when creating your cluster. With these on, ElastiCache monitors the primary node and promotes a replica to primary if the primary becomes unavailable.
Failover typically completes in 60-90 seconds. DNS is updated to point to the new primary. Your application should handle connection retries with backoff to reconnect after failover.
One important detail: automatic failover requires at least one replica. A standalone primary (no replicas) cannot fail over.
For cluster mode enabled, each shard fails over independently. A single-shard failure doesn't affect other shards.
Cost considerations
ElastiCache pricing has several components:
Node hours. Charged per hour for each running node. A cache.r7g.large (2 vCPU, 13 GB) costs around $0.166/hr (~$121/month) in us-east-1. With one replica, that doubles to ~$242/month.
Backup storage. Daily automated backups are free up to your cluster's data size. Additional backup storage is $0.085/GB-month.
Data transfer. In-region data transfer between EC2 and ElastiCache in the same AZ is free. Cross-AZ transfer costs $0.01/GB in each direction. This adds up when your application and cache are in different AZs.
Serverless pricing. Around $0.125/GB-hour for data stored and $0.0034/ECU-hour for compute. A cache storing 10 GB with moderate traffic costs roughly $30-50/month.
To reduce cost on steady workloads, consider 1-year reserved node pricing, which saves 30-40% over on-demand rates.
Best practices
Set a maxmemory policy
ElastiCache nodes have fixed memory. When the cache fills up, Redis needs to know what to do. Configure this in your parameter group:
maxmemory-policy = allkeys-lru
allkeys-lru evicts the least recently used keys across all keys. This is the most common choice for pure caching workloads. If you're using Redis for both caching and persistent data (like session storage), volatile-lru only evicts keys that have a TTL set, leaving persistent keys alone.
Avoid noeviction unless you want Redis to return errors when memory is full. That will break your application in unexpected ways.
Set an explicit maxmemory value
Even though the node has a fixed amount of RAM, some memory is used by Redis's own overhead, replication buffers, and OS processes. A safe rule is to set maxmemory to 75-80% of the node's available memory. For a 13 GB node:
maxmemory = 10gb
This prevents Redis from running out of memory for replication buffers, which can cause connection drops.
Use connection pooling
Opening a new connection to Redis on every request is expensive. Use a connection pool in your application.
Size your pool based on the number of application threads or concurrent requests. Too small and requests queue up waiting for a connection. Too large and you'll hit ElastiCache's connection limit (visible in CloudWatch as CurrConnections).
Always set TTLs
Keys without TTLs never expire. In a cache, this means the dataset grows until eviction kicks in and removes keys you might still want. Set expiration on every key you write:
Monitor with CloudWatch
ElastiCache publishes metrics to CloudWatch automatically. The most important ones to watch:
| Metric | What it tells you |
|---|---|
CacheHits / CacheMisses | Cache effectiveness. Low hit rate means data isn't being cached long enough or keys are evicted too early. |
CurrConnections | Current client connections. Watch for spikes or sustained high values. |
DatabaseMemoryUsagePercentage | How full the cache is. Set an alarm at 80%. |
Evictions | Keys being evicted. High evictions mean the cache is too small for the working set. |
ReplicationLag | How far behind replicas are. High lag means replicas may serve stale data. |
EngineCPUUtilization | CPU usage on the Redis engine thread. Sustained high values indicate command bottlenecks. |
Set CloudWatch alarms on DatabaseMemoryUsagePercentage (alert at 80%) and Evictions (alert on sustained high values). These are the two most common production issues.
ElastiCache vs. self-hosted Redis vs. Upstash
ElastiCache is not the only option for managed Redis. Here's how it compares:
| ElastiCache | Self-hosted on EC2 | Upstash | |
|---|---|---|---|
| Setup time | Minutes | Hours to days | Minutes |
| Management overhead | Low | High | Minimal |
| Pricing model | Per node-hour | EC2 + EBS costs | Per request |
| Minimum cost | ~$13/mo (t4g.micro) | ~$8/mo (t4g.nano EC2) | $0 (free tier) |
| Scale to zero | No (Serverless: no) | No | Yes |
| Multi-region | Via Global Datastore | Manual setup | Yes (built-in) |
| VPC support | Yes | Yes | Via private link |
| AWS integration | Native | Manual | Limited |
| Max data size | Node memory | Node memory | 10 GB on free tier |
Choose ElastiCache when you're already running on AWS, need tight VPC integration, and want a managed service without thinking about pricing per request. It's the right default for production workloads on AWS.
Choose self-hosted Redis when you need full control over Redis configuration, want to run a specific version or fork, or your team has the operational capacity to manage it. Cost savings are real only if you're running at high utilization.
Choose Upstash when your workload is intermittent, you're building a side project, or you need a Redis-compatible store with a free tier and per-request pricing. Upstash also works well outside AWS (Vercel, Cloudflare Workers, etc.).
For most teams shipping on AWS, ElastiCache is the straightforward choice. Serverless is worth considering for variable or low-traffic workloads. For steady, high-throughput production use, provisioned nodes with reserved pricing are more cost-effective.
Keep Reading
Memcached vs Redis: Caching Solutions Compared
Comparing Memcached and Redis for caching. When simple key-value caching is enough and when you need Redis's data structures.
Redis Streams: Complete Guide to Real-Time Data
Learn how to use Redis Streams for event sourcing, message queues, and real-time data processing.
DB Pro v1.6.0: Redis Support and PostgreSQL Enums
Redis database support with managed providers, and PostgreSQL enum browsing.