In this post, we walk you through performing schema changes and common maintenance tasks such as table and index reorganization, VACUUM FULL, and materialized view refreshes with minimal downtime using blue/green deployments for an Amazon Relational Database (Amazon RDS) for PostgreSQL database or an Amazon Aurora PostgreSQL-Compatible Edition cluster.
Solution overview
Amazon RDS blue/green deployment enables you to perform changes in a staging environment that are isolated from the production system. It first creates a staging environment (green) that is a topological copy of your current production environment (blue). The blue and green environments stay in sync using logical replication. This feature is available on Amazon RDS for PostgreSQL version 11.21 onwards.
The blue/green environment is ready for performing maintenance activities when there is minimal or no replication lag between the source database (blue) and staging database (green). You can perform the maintenance activities in the green environment, while the application continues to use the blue database. It’s highly recommended you schedule any maintenance activities at a low-activity time for your application. This will avoid the higher replica lag, and in turn improve the overall performance of the test.
With blue/green deployment, you can address the following use cases with minimal downtime or no impact on the source environment:
Gather statistics using ANALYZE
Perform VACUUM FULL on catalog tables
Adjust the fill factor for a table
Rebuild indexes for optimization
Add new columns with and without default values for non-serial data types
Expand the column size in a table
Refresh materialized views
Enable the required extensions in shared_preload_libraries
Now, we will delve into the scenarios where you can use blue/green deployment to implement database changes with minimal downtime. Refer to limitations for blue/green deployments to ensure compatibility for using this feature.
Prerequisites
Complete the following steps to set up the test environment:
Create an RDS for PostgreSQL instance or an Amazon Aurora PostgreSQL cluster if you don’t already have one. For instructions, refer to Create a PostgreSQL DB instance or Create an Amazon PostgreSQL DB cluster, respectively.
Create a new cluster parameter group with rds.logical_replication as 1 to have WAL translate to logical operations and max_logical_replication_workers as 10 (there is one logical replication slot for each customer database along with the rdsadmin database).
Associate the cluster parameter group to newly created RDS instance. For more information, refer to New – Fully managed Blue/Green Deployment in Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL.
Create an Amazon Elastic Compute Cloud (Amazon EC2) instance to install the PostgreSQL client to access the RDS for PostgreSQL instance. For instructions, refer to Create your EC2 resources and launch your EC2 instance. Or you can set up connectivity between your RDS database and EC2 compute instance in one click.
Install the PostgreSQL client. On Amazon Linux 2023, you can use the following commands to download the psql command line tool:
Create a sample database for testing on the source database:
We use pgbench, a simple PostgreSQL benchmarking tool, for our test case. Run the pgbench command with the insert-only workload option to create four tables and ingest test data:
Create the RDS blue/green environment. Refer to New – Fully managed Blue/Green Deployment in Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL for the prerequisites and instructions to set up the blue/green environment.
When the blue/green environment is ready and replication lag is minimal, establish a connection to the green environment using the psql client. The default setting for the green environment is read-only mode. To enable write operations in the green environment, run the command to set the default_transaction_read_only parameter to off at the session level. This allows write operations in the green environment.
Note: You can run VACUUM and ANALYZE operations in the green environment without enabling ‘write’ at session level.
Verify the database session is in read/write mode:
Please ensure to continually monitor the replication lag between the blue and green environment as follows:
Monitor replication lag on green environment
When making write operation changes to the green environment, it’s important to monitor the replication slots from the blue environment. A replication lag will continue to increase in case of any logical replication issues. If this happens, the green environment will not consume WAL and the disk usage will increase resulting in the database to hang.
Following query checks the gap from the blue environment using log sequence number (lsn).
Monitor the RDS for PostgreSQL log in the green environment:
Use Cases
With the test environment set up for blue/green deployment, let’s dive into practical code examples showcasing various maintenance tasks achievable through Amazon RDS blue/green deployments.
Gather statistics using ANALYZE
The Major Version Upgrade (MVU) of Amazon RDS for PostgreSQL database employs the pg_upgrade module. However, this module doesn’t transfer optimizer statistics from the prior version, which can hamper the performance of queries post-upgrade. Therefore, it’s crucial to run the ANALYZE command to regenerate these statistics for each database in the primary instance.
The blue/green deployment method is an effective strategy to mitigate the impact of this operation. It allows users to run the ANALYZE operation on the green (new) environment before transitioning the application workload to it.
However, it’s important to note that during a blue/green deployment setup for a major version upgrade, the optimizer statistics will not be available in the green environment. You can validate these table-level statistics by querying pg_stats:
If no statistics are found, they need to be updated. Although the green environment continues to be logically replicated from the blue environment, the ANALYZE operation is not logically replicated. Therefore, you must manually run the ANALYZE command in the green environment.
The following is the command to analyze the tables on the green environment:
This command returns a detailed analysis of the public.pgbench_accounts table, including the number of live rows, dead rows, rows in the sample, and estimated total number of rows.
Validate the statistics gathered by the ANALYZE command to confirm they’ve been updated correctly:
Perform VACUUM FULL on catalog tables
In PostgreSQL, bloat refers to the excess space that is occupied by dead tuples, which are essentially remnants of old data that is no longer valid. This situation can occur due to frequent updates and deletions in your database tables.
When it comes to catalog table bloat, this can be primarily caused by high frequency of creating and dropping objects within your database. To remove this bloat and reduce the size of the table, it is necessary to perform a VACUUM FULL operation, which will require ACCESS EXCLUSIVE LOCK on the tables. This lock on catalog tables won’t allow any queries to the database.
Although tools like pg_repack can help reduce downtime when dealing with bloat on user tables by allowing us to perform a VACUUM FULL operation with locking the table for a shorter amount of time, this doesn’t work with catalog tables. For more information on pg_repack, see Remove bloat from Amazon Aurora and RDS for PostgreSQL with pg_repack.
However, in a blue/green deployment model, you can effectively manage this downtime. You can run the VACUUM FULL operation on the green environment while the blue environment continues to handle live traffic, causing minimal disruption to your database operations. The following is the command to run on the green environment:
Adjust the fill factor for a table
Adjusting the fill factor of a table in a database is a crucial operation that can have significant impact on performance. The fill factor is essentially the percentage of space on each page (the basic unit of storage in PostgreSQL) to be filled with data, reserving the rest for updates of that data. For example, a fill factor of 70 means PostgreSQL will try to leave 30% of every leaf page unfilled when it writes it out, so that subsequent updates of existing data can place the updated rows on the same page as the original. This can avoid the need for costly page splits.
For example, your business might have an application with a table with a high volume of updates. Initially, the table was created with a fill factor of 100, which means the pages are completely filled when data is inserted. Over time, as the statuses are updated, the database needs to find new pages to store the updated rows, leading to bloat and an inefficient use of space.
Since the database spends a lot of time looking for free space and moving rows around, these frequent updates cause performance issues. In these types of workloads, you can change the fill factor of this table to leave enough free space for updates.
However, changing the fill factor of a table needs an ACCESS EXCLUSIVE LOCK for a significant amount of time (depending on the size of the table). To avoid the downtime, you can run this operation on the green environment. Complete the following steps:
Set the session as read/write and then perform the ALTER:
Change the FillFactor for the pgbench_tellers table:
Reorganize the table to make fillfactor effective for all blocks in the table:
Concurrent to the preceding operation, we can run a heavy DML workload activity on the table in the blue environment:
Due to the heavy workload on the table, we saw an increase in the lag (lsn_distance), as shown in the following code. However, it didn’t impact the replication. The table was altered successfully in the green environment without any issues reported. Also, the green environment was able to catch up on the lag from the blue environment.
Rebuild indexes for optimization
Reindexing is an essential operation that is typically run when indexes become corrupt or bloated. PostgreSQL v12 offers a feature that allows the recreation of indexes using the CONCURRENT option. This feature enables the rebuilding of indexes while minimizing the locking of writes, which is a significant improvement for high availability tables where even a brief lock might not be acceptable.
Using the blue/green deployment model can be particularly beneficial in such cases. By building indexes with the CONCURRENT option in the green environment, you can minimize performance impacts on the application and prevent locking issues.
To confirm that the database is in read/write mode, run the following commands:
After the validation, perform the REINDEX operation on the index:
In our test, we observed it was noted that the green environment experienced high IO:DataFileRead waits, as shown in the following screenshot. This is expected due to the concurrent reading and rebuilding of the index. Importantly, no locking or other waits were observed due to the brief nature of the cutover facilitated by the CONCURRENT option.
Add new columns with and without default values for non-serial data types
In this section, we demonstrate how to add a column to a table in the green environment. The ALTER TABLE operation supports adding a column with or without a default value. However, the addition of a serial data type (auto incremented columns) is not supported. It will break the logical replication.
After you alter the table by adding a column (with a default value and with no default value), the replication works fine. It can be a useful feature for dynamic environments where it is required to add a column in the table without impacting the application availability.
Blue/green deployment does not support the addition of auto-incremented columns (serial data type). As shown in the following logs from the green environment, the serial data type cannot set the nextval(), but can add an int column with no issues:
Expanding column size in a table
In database management, it’s possible to expand data types based on your requirements. For example, you can scale from an integer (int) to a large integer type like bigint, or even further to numeric. Similarly, the character data type (char(n)) can be expanded to varchar or text.
However, such expansions require a certain amount of downtime. The duration of this downtime is directly proportional to the size of the table you’re altering. Larger tables will demand a longer downtime.
To minimize the impact of this downtime, you can expand the column in the green environment.
On the green environment, expand the id column from INT to BIGINT:
Create another session on the database and from that session, check the lock on the table:
Check the replication by updating a row on the blue environment:
On the blue environment run:
On the green environment run:
Review the metrics in Amazon RDS Performance Insights of the blue environment to confirm that there was no locking due to the column expand on the green environment.
Refresh materialized views
A current limitation of logical replication is that a materialized view refresh is not replicated because there is no WAL for this operation. Alternatively, you can perform a refresh manually on the green just prior to switchover to have up-to-date views in preparation for the green environment switch.
Let’s set up the environment:
Create a materialized view on the green environment:
Perform an INSERT operation on the underlying tables of the materialized view in the blue environment:
Refresh the materialized view in the green environment:
During creation as well as during the refresh of the materialized view, there isn’t lag on the logical replication with data being actively inserted on the underlying tables:
You can benefit from this because you can create and refresh your materialized views on the green environment.
After the switchover, the materialized view is available for use in the green environment. A pre-switchover refresh of a materialized view can improve user experience and downtime.
Enable required extensions in shared_preload_libraries
To successfully enable required extensions, an important preliminary step entails updating the shared_preload_libraries parameter. This move is crucial because it sets the stage for the rest of the process, laying the groundwork for the necessary changes to be implemented smoothly. However, this change necessitates a system restart.
In an ideal situation where you have a green environment set up, this process becomes even more streamlined. The parameter can be effectively changed, and the system can be restarted in the green environment. This method prevents any disruption to the ongoing operations in the blue environment. When the green environment is up and running smoothly with the updated parameter, the application can then be strategically pointed from the blue environment to the green. This careful, step-by-step approach results in minimal disruptions while maintaining optimal operational efficiency.
To update this parameter on the green environment, you need to modify the value in the cluster parameter group and restart the instance.
Limitations
The standard PostgreSQL logical replication limitations apply to blue/green deployments. For more information, refer to Limitations for blue/green deployments.
Clean up
To clean up after switchover, delete the database instances in the blue environment. If you created a new RDS for PostgreSQL instance to follow along with this post, delete the blue/green deployment object and green environment as well. For instructions, see Deleting a DB instance.
Conclusion
In this post, we showed you a method to other than manual operations that can’t be logically replicated, the green environment is well-suited for maintenance operations that could potentially disrupt the performance of a production database. This includes altering a table’s fill factor, removing table bloat using VACUUM FULL (or pg_repack), and removing index bloat using REINDEX CONCURRENTLY.
This post demonstrated a method to implement schema changes in Amazon RDS for PostgreSQL with minimal impact using blue/green deployments. You can apply the same approach to Amazon Aurora PostgreSQL.
We welcome your feedback; leave your comments or questions in the comments section.
About the authors
Anita Singh is a Principal Specialist Solutions Architect with over 20 years of experience, architecting solutions focused around Database technologies. She guides customers with designing and building highly scalable, available and performant applications. She actively engages and advices customers on their database migration and modernization journey to or within the AWS cloud infrastructure.
Baji Shaik is a Sr. Lead Consultant with AWS Professional Services, Global Competency Centre. His background spans a wide depth and breadth of expertise and experience in SQL and NoSQL database technologies. He is a Database Migration Expert and has developed many successful database solutions addressing challenging business requirements for moving databases from on premises to Amazon RDS and Amazon Aurora PostgreSQL/MySQL. He is an eminent author, having written several books on PostgreSQL. A few of his recent works include “PostgreSQL Configuration,†“Beginning PostgreSQL on the Cloud,†and “PostgreSQL Development Essentials.â€
Sarabjeet Singh is a Database Specialist Solutions Architect at AWS. He works with customers to provide guidance and technical assistance on database projects, helping them improve the value of their solutions when using AWS.
Nathan Ballance is a Cloud Support Engineer (Databases) at Amazon Web Services, specializing in providing customized solutions for AWS customers. He provides troubleshooting support and collaborates with AWS service teams to resolve complex customer issues, addressing a wide range of technical challenges.
Source: Read More