Using RDS Proxy with Amazon RDS Multi-AZ DB instance deployment to improve planned failover time

Amazon Relational Database Service (Amazon RDS) Multi-AZ deployments provide a simple and effective solution for achieving high availability (HA) for databases. Amazon RDS Multi-AZ deployments can have one or two standby DB instances. When the deployment has one standby DB instance, itâ€™s called aÂ Multi-AZ DB instance deployment â€“ which will be the focus of this post.

When you enable Multi-AZ DB instance deployment configuration, Amazon RDS creates a fully synchronized, redundant standby instance in another Availability Zone (AZ) to maintain business continuity in case of AZ failure. If your primary DB instance experiences issues with network connectivity, compute unit failure or storage failure, RDS detects the failure and automatically promotes the standby instance to the primary role. This process, known as a failover, helps maintain availability.

Failovers can be categorized as planned or unplanned:

Planned failovers occur during administrative actions such as upgrading the operating system (OS) or modifying the instance class. You can manually invoke a planned failover through the Amazon RDS API reboot-db-instance â€“force-failover or through the Amazon RDS console for disaster recovery purposes.
Unplanned failovers are invoked by unexpected issues such as loss of network connectivity, compute unit failure or storage failure on the primary.

In this post, we demonstrate improvements in planned failover downtime of Multi-AZ instance deployment with Amazon RDS Proxy, a result of several optimizations made by RDS.

Achieving HA through Amazon RDS Multi-AZ DB instance deployment with RDS Proxy

In an Amazon RDS Multi-AZ DB instance deployment, the primary instance (shown in yellow in the following figure) handles read/write traffic, and the standby instance (shown in red) remains on standby, ready to take over if needed.

The following diagram illustrates an Amazon RDS Multi-AZ DB instance deployment operating in its normal connected state. In this configuration, two active Amazon Elastic Compute Cloud (Amazon EC2) instances run in separate Availability Zones. Each instance manages a set of Amazon Elastic Block Storage (EBS) volumes containing a full copy of the data, with a storage-level replication layer connecting these volumes to the standby instanceâ€™s EBS volumes.

The database application (DB APP, shown in green in the preceding figure) uses DNS (shown in orange) to retrieve the address of the current external endpoint providing access to the data. In this example, DNS is directing the application (DB APP) to the primary instance, serving the primary copy of the data that is available inÂ Availability Zone 1.

In the event of a failure, Amazon RDS automatically switches the roles of the primary and standby instances and updates the IP address associated with the databaseâ€™s DNS (hostname). This allows client applications to maintain their connection settings during failover. This process, known as DNS propagation, can take up to 35 seconds to complete.

RDS Proxy eliminates the 35 seconds of DNS propagation delay by continuously monitoring both instances, allowing it to bypass DNS propagation. This allows RDS Proxy to deliver a faster failover response for client applications, maximizing availability during failovers. To set up RDS Proxy with your Amazon RDS Multi-AZ DB instance deployment, refer to Connecting to a database through RDS Proxy.

In a Multi-AZ DB instance deployment, Amazon RDS carries out maintenance operations such as Instance class modification and OS upgrades on the standby instance (step 1 of the following figure). After that, Amazon RDS performs a planned failover (step 2) once standby catches up with the primary, switching the standby to be the new primary, and finishes maintenance on the standby (old primary) (step 3). When complete, Amazon RDS reconnects both the primary and standby to resume storage level replication for achieving high availability. This approach reduces downtime because the only interruption to your application happens during the brief planned failover, which affects database connections and write operations. The following figure depicts the high-level process of how Amazon RDS performs most of its maintenance operations on Amazon RDS Multi-AZ DB instance deployment.

We have implemented several improvements to the planned failover process (Step 2), and database restart times for RDS for MySQL, MariaDB and PostgreSQL. When integrated with RDS Proxy, these optimizations have minimized downtime, ensuring smoother transitions with minimal impact on applications during maintenance operations such as instance class modifications, OS upgrades, and reboot with force failover for disaster recovery requirements.

Benchmarking

To assess the impact of these optimizations, we conducted 100 tests on an Amazon RDS Multi-AZ DB instance deployment integrated with RDS Proxy with minimal write workload. We averaged the write downtime before and after the optimizations. This downtime is tracked using an application that measures the period between the first write failure and the next successful write. In our testing, we observed up to 4.9X reduction in downtime during â€˜instance modifyâ€™ operation, up to 4.8X reduction during â€˜OS upgradeâ€™, and up to 3X reduction during reboots with forced (planned) failovers. The results for each of the three services (RDS for MySQL, MariaDB and PostgreSQL) are shown in the figures below. These results are not absolute and may vary depending on your specific workloads.

The following graph compares the write downtime during the modify instance class operation from db.r5.xlarge to db.r5.large before and after optimizations using the default parameter group.

The following graph compares the write downtime during the OS upgrade operation before and after optimizations on instance class db.r5.xlarge using the default parameter group.

The following graph compares the write downtime during the reboot-with-force-failover operation before and after optimizations on instance class db.r5.xlarge using the default parameter group.

Note: Although Amazon RDS has optimized the planned failover downtime, including optimizing database start times, the overall failover process can still be affected by longer engine crash recovery times. Despite these advancements, extended crash recovery times may impact the speed of database restarts during failovers.

Conclusion

In this post, we showed you the improvements in downtime reduction possible by integrating Amazon RDS for MySQL, MariaDB or PostgreSQL Multi-AZ DB instance with RDS Proxy. The three areas with the maximum impact of these improvements are:

Modify instance class â€“ Performance improved by up to 4.9 times for Amazon RDS for MariaDB, 4.3 times for Amazon RDS for MySQL, and 3.3 times for Amazon RDS for PostgreSQL
OS upgrades â€“ Downtime reduced by up to 4 times for Amazon RDS for MariaDB, 4.8 times for Amazon RDS for MySQL, and 3.4 times for Amazon RDS for PostgreSQL
Reboot with force failover â€“ Downtime reduced by up to 3 times for Amazon RDS for MariaDB, 2.5 times for Amazon RDS for MySQL, and 1.5 times for Amazon RDS for PostgreSQL

These improvements are now available across all Amazon RDS for MySQL, MariaDB and PostgreSQL DB instances. You do not need to make any changes to your workload or DB instance to receive these benefits. We invite you to try out these operations on your DB instances to observe the impact of these improvements. If you have any questions, or feedback, do share with us in the comments section below.

About the author

Rajat Jain is a Software Development Engineer within the Amazon RDS Open Source Engines team. He specializes in architecting and implementing robust Control Plane components for open-source database engines. His expertise spans across performance optimization, scalability enhancements, and ensuring high availability for RDS Open Source database services.

Source: Read More

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

Minecraft licensing robbed us of this controversial NFL schedule release video

The power of generators

The power of generators

Simplify Factory Associations with Laravel’s UseFactory Attribute

This Week in Laravel: React Native, PhpStorm Junie, and more

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

Using RDS Proxy with Amazon RDS Multi-AZ DB instance deployment to improve planned failover time

Achieving HA through Amazon RDS Multi-AZ DB instance deployment with RDS Proxy

Benchmarking

Conclusion

About the author

Salesforce AI Releases BLIP3-o: A Fully Open-Source Unified Multimodal Model Built with CLIP Embeddings and Flow Matching for Image Understanding and Generation

Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

Microsoft 365 app gets Copilot tab with Prompt Gallery for Windows 11 (personal accounts)

atftp is a client/server implementation of the TFTP protocol

CVE-2025-4787 – SourceCodester Oretnom23 Stock Management System SQL Injection Vulnerability

Meta unveils a $25-per-month, interest-free Quest 3 payment plan. Is this deal worth it?

The AI for Science Forum: A new era of discovery

Report: Data is a barrier to AI project success

What are Large Language Model (LLMs)?

How do I click on an element available in a specific row where dynamic row is added?

Using RDS Proxy with Amazon RDS Multi-AZ DB instance deployment to improve planned failover time

Achieving HA through Amazon RDS Multi-AZ DB instance deployment with RDS Proxy

Benchmarking

Conclusion

About the author

Related Posts