Customers are increasingly adopting multi-Region applications as they offer high availability, enhanced resilience, and business continuity, making them essential for organizations with demanding uptime requirements. Regulatory compliance also drives this trend, particularly in sectors like finance and healthcare, where specific workloads must be distributed across multiple AWS Regions to meet stringent legal obligations. Furthermore, customers with a global footprint are building multi-Region applications to minimize latency and deliver exceptional user experiences to their international customer base. While some organizations attempt to build their own multi-Region solutions, these often prove challenging to maintain and operate. Common issues include complex database replication and manual database failover procedures, difficulty in resolving data conflicts, and maintaining data consistency across regions. These challenges underscore the need for more robust, managed solutions that can simplify multi-Region application deployment and management.
On December 1, 2024, we announced the general availability of Amazon MemoryDB Multi-Region, a fully managed, active-active, multi-Region database that you can use to build applications with up to 99.999% availability, microsecond read, and single-digit millisecond write latencies across multiple Regions. MemoryDB Multi-Region is supported with Valkey, which is a Redis Open Source Software (OSS) drop-in replacement stewarded by Linux Foundation. With MemoryDB Multi-Region, you can build highly available multi-Region applications for increased resiliency. It offers active-active replication so you can serve reads and writes locally from the Regions closest to your customers with ultra-low latency. In addition, with active-active replication, managing Region failure on the application side is much easier, because you don’t need to do a database failover. MemoryDB Multi-Region asynchronously replicates data between Regions and data is typically propagated within a second. It automatically resolves conflicts and corrects data divergence issues.
For more information about getting started with MemoryDB Multi-Region, refer to Amazon MemoryDB Multi-Region is now generally available.
In this post, we cover the benefits of MemoryDB Multi-Region, how it works, its disaster recovery capabilities, the consistency and conflict resolution mechanisms, and how to monitor replication lag across Regions.
Benefits of MemoryDB Multi-Region
MemoryDB Multi-Region provides the following benefits to customers:
- High availability and disaster recovery – With MemoryDB Multi-Region, you can build applications with up to 99.999% availability. You will have your full stack application deployed in multiple Regions. The application users connect to the closest Region of your application for optimal latency. In case of a Regional isolation or degradation, the application user can connect to the application stack deployed in another Region with full read and write access to the data powered by MemoryDB Multi-Region, without a complex manual database failover process. After the impacted Region recovers, the application user reconnects to the original Region, where data has been automatically synced back from the other Regions.
- Microsecond read and single-digit millisecond write latency for multi-Region distributed applications – MemoryDB Multi-Region offers active-active replication, so you can serve both reads and writes locally from the Regions closest to your customers with microsecond read and single-digit millisecond write latency at any scale. It automatically replicates data asynchronously between Regions with data typically propagated in less than 1 second.
- Adherence to compliance and regulatory requirements where data needs to reside in a specific geography – There are compliance and regulatory requirements under which data needs to be within a geographic location. MemoryDB Multi-Region can help you meet these requirements because it allows you to choose which Region you want your data to reside.
MemoryDB Multi-Region overview
MemoryDB stores the entire dataset in memory and uses a distributed Multi-AZ transactional log to provide data durability, consistency, and recoverability. A Multi-Region cluster is a collection of one or more Regional MemoryDB clusters. Each MemoryDB Regional cluster stores the same set of data in a single Region. In a single-Region cluster, MemoryDB provides strong data consistency for primary nodes and guaranteed eventual consistency for replica nodes. In a Multi-Region cluster, MemoryDB uses eventual consistency across Regions. When an application writes data to any Regional cluster, MemoryDB automatically and asynchronously replicates that data to all the other Regional clusters within the Multi-Region cluster. Writes are durably stored in a local multi-AZ transactional log before write ACK is sent back to client.
MemoryDB Multi-Region is built on a highly resilient replication infrastructure. To replicate data, each MemoryDB Regional cluster pulls the replication logs from the Multi-AZ transaction log of every other Regional cluster through the AWS cross-Region network. In addition, the replication logs pulled from other Regions are stored in a local Regional Multi-AZ transaction log. This allows Region-independent recovery, achieving 99.999% availability with MemoryDB Multi-Region. You can add Regional clusters to the Multi-Region cluster so that it can be available in additional Regions. You can expand your Multi-Region cluster to up to five Regions.
One of the key features of MemoryDB Multi-Region is that it offers active-active replication. This means that each Region in the cluster can accept write operations, enabling applications to write data simultaneously in multiple Regions. The active-active setup facilitates two-way replication between Regions, ensuring that data remains consistent across the entire Multi-Region cluster. This helps you to serve both reads and writes locally from the Regions closest to your customers with microsecond read and single-digit millisecond write latency at any scale.
Use case: Disaster recovery with MemoryDB Multi-Region
MemoryDB Multi-Region can help customers building multi-Region applications for disaster recovery. The advantage of using MemoryDB Multi-Region for disaster recovery is that with active-active replication, managing Region failure on the application side is much easier, because you don’t need to do a database failover. The application continues to read and write in the other Regions. MemoryDB Multi-Region provides disaster recovery with near-zero recovery time (RTO).
Let’s explore how you can achieve disaster recovery with MemoryDB Multi-Region. The following architecture illustrates a three-node (one primary, two read replicas) MemoryDB cluster in Regions 1 and 2.
MemoryDB clusters operate in separate Regions, each accepting reads and writes independently. MemoryDB Multi-Region replicates all clusters, and the system routes traffic to the nearest cluster for optimal latency. In the unlikely event that an application becomes inaccessible in one Region, MemoryDB Multi-Region can still keep the latest data synchronized across Regions.
The failover process consists of the following steps:
- The Amazon Route 53 health monitor detects that the application in a Region is inaccessible.
- Route 53 automatically removes the failed Region’s endpoint from the DNS record.
- Applications previously connected to the failed endpoint experience a timeout, trigger a new DNS resolution, and receive a new cluster endpoint of the new Region.
- The application reconnects and resumes operations with the latest synchronized data from MemoryDB, providing continuity.
This entire failover process is fully automated, requiring no manual intervention from operators, providing high availability with minimal operational overhead.
During the above failure, traffic is shifted from the failed Region to the other healthy Region. This shift could last for a few minutes. During this period, some requests will be served in the failed Region, while others in the healthy Region. As these requests target the same set of user data, it is possible that the same key could be concurrently modified by both Regions at the same time. This causes write-write conflicts and would lead to data inconsistency. To resolve such conflicts, MemoryDB Multi-Region uses last writer wins strategy. We will cover more details about conflict resolution later in the post.
By implementing MemoryDB Multi-Region, you benefit from high availability, and data consistency, even during Regional outages. Your application provides seamless operation and data integrity without manual intervention.
Consistency and conflict resolution
MemoryDB Multi-Region uses Conflict-free Replicated Data Type (CRDT) to reconcile between conflicting concurrent writes. Conflict resolution is fully managed and happens in the background without any impact to application’s availability. CRDT is a type of data structure that can be updated independently and concurrently without coordination. The write-write conflicts are merged independently on each Region with eventual consistency. CRDT allows MemoryDB to simplify cross-Region replication architecture by resolving conflicts without requiring inter-Region coordination or one-time additional replication workloads. MemoryDB Multi-Region also uses two levels of Last Writer Wins (LWW) to resolve conflicts. For String data type, LWW resolves conflicts at a key level. For other data types, LWW resolves conflicts at a sub-key level.
Let’s explore a few scenarios with different data structures and how MemoryDB Multi-Region handles conflicts.
Scenario 1: Concurrent writes with LWW conflict resolution (String data type)
Region A executes SET K V1
at timestamp T1; Region B executes SET K V2
at timestamp T2. After replication, both Regions A and B will have key K
with value V2
. When different Regions are concurrently updating the same key with the String data type, MemoryDB Multi-Region would use LWW at key level to resolve the conflict.
Time | Region A | Region B |
T1 | SET K V1 | |
T2 | SET K V2 | |
T3 | Sync | |
T4 | K: V2 | K: V2 |
Scenario 2: Concurrent writes without sub-key level conflict (Hash data type)
Region A executes setting a field value in a hash key K as HSET K F1 V1
at timestamp T1. Region B executes HSET K F2 V2
at timestamp T2. After replication, both Regions A and B will have key K
with both fields. When different Regions are concurrently updating different sub-keys in the same collection that don’t conflict, the operations such as HSET
are associative.
Time | Region A | Region B |
T1 | HSET K F1 V1 | |
T2 | HSET K F2 V2 | |
T3 | Sync | |
T4 | K: {F1:V1, F2:V2} | K: {F1:V1, F2:V2} |
Scenario 3: Concurrent writes with LWW conflict resolution (Set data type)
Similar to other collection data types, the Set data type also uses LWW at the sub-key level for conflict resolution. If there are no conflicts at the sub-key level, the operations such as SADD
(add a member to SET) are associative.
Time | Region A | Region B |
T1 | SADD K V1 | |
T2 | SADD K V2 | |
T3 | Sync | |
T4 | K: {V1, V2} | K: {V1, V2} |
Scenario 4: Concurrent key update and key deletion (Any data type)
When the key exists in the Region before the deletion operation, the delete operation is replicated and the effect is seen on both Regions. With the update operation at T4 using the command SADD K V2
, both Regions would have the same value of V2
after replication.
Time | Region A | Region B |
T1 | SADD K V1 | |
T2 | Sync | |
T3 | DEL K | |
T4 | SADD K V2 | |
T5 | Sync | |
T6 | K: {V2} | K: {V2} |
Monitoring
MemoryDB offers native integrations with Amazon CloudWatch to provide a high level of observability to relevant metrics. CloudWatch collects raw data and processes it into readable, near real-time metrics. You can also set alarms that watch for certain thresholds, and send notifications or take actions when those thresholds are met. With the launch of MemoryDB Multi-Region, a key update is the introduction of a new metric, MultiRegionClusterReplicationLag
, which measures the elapsed time between when an update is written to the remote Multi-Region Regional cluster Multi-AZ transaction log, and when that update is written to the primary node in the local Multi-Region Regional cluster. The metric is expressed in milliseconds and is emitted for every source- and destination-Region pair at shard level.
The following is a sample screenshot of this new metric.
You can also view this new MultiRegionClusterReplicationLag
metric on the CloudWatch console.
During normal operation, this metric should be constant. An elevated value for it might indicate that updates from one Regional cluster are not propagating to other Regional clusters in a timely manner. MultiRegionClusterReplicationLag
can also increase if a Region becomes isolated or degraded and you have a Regional cluster in that Region. In this case, you can temporarily redirect your application’s read and write activity to a different healthy Region.
Summary
In this post, we introduced Amazon MemoryDB Multi-Region, its benefits, and how it works. We discussed a disaster recovery use case, the consistency semantics, the conflict resolution mechanism, and monitoring of MemoryDB Multi-Region.
To learn about how to get started with MemoryDB Multi-Region, refer to Amazon MemoryDB Documentation. Do you have follow-up questions or feedback? Leave a comment. We’d love to hear your feedback.
About the Authors
Karthik Konaparthi is a Principal Product Manager on the Amazon In-Memory Databases team and is based in Seattle, WA. He is passionate about all things data and spends his time working with customers to understand their requirements and build exceptional products.
Lakshmi Peri is a Sr. Solutions Architect on Amazon ElastiCache and Amazon MemoryDB. She has more than a decade of experience working with various NoSQL databases and architecting highly scalable applications with distributed technologies. Lakshmi has a particular focus on vector databases and AI recommendation systems.
Source: Read More