As organizations increasingly rely on cloud infrastructure, the role of database engines in providing the resilience, reliability, and seamless backup of long-term data has become pivotal. Regulations such as GDPR, HIPAA, and PCI DSS have emphasized the necessity of robust data backup and recovery practices. Implementing a robust, forward-thinking data backup strategy empowers businesses to safeguard their critical information, enforce regulatory compliance, and protect against potential cybersecurity challenges. By developing a comprehensive approach to data preservation and recovery, organizations can enhance operational resilience, maintain customer confidence, and create a solid foundation for long-term success.
In this post, we show you several long-term data backup strategies and how to effectively implement them in the AWS environment, with a focus on Amazon Relational Database Service (Amazon RDS) and Amazon Aurora.
- Backup management – AWS Backup offers a centralized platform for automating and simplifying backup processes across different AWS services, providing comprehensive data protection.
- Flexible data lifecycle management – Advanced data management strategies, including archiving, retention policies, deletion protocols, and versioning, enable organizations to optimize storage costs, enforce compliance, and maintain data accessibility and integrity throughout the data’s lifespan.
To enable long-term data archiving on the AWS Cloud, organizations can use multiple strategic approaches:
Long-term data backup strategies
Business organizations need long-term data backup strategies for the following reasons:
- Regulatory compliance – Many industries are governed by strict regulations requiring organizations to retain data for several years (such as healthcare, finance, and legal). Long-term backups make sure businesses meet legal and regulatory obligations, avoiding penalties and reputational risks.
- Audit and historical analysis – Organizations often need historical data for audits, financial reviews, or performance analysis. Long-term backups preserve the integrity and availability of this information.
- Business intelligence and trends – Historical data is critical for analyzing trends, forecasting, and strategic decision-making. Long-term backups allow organizations to access and use past data for future insights.
- Legal and litigation needs – In legal disputes, organizations might need to provide evidence from years back. Having long-term backups makes sure that such data is accessible when required.
Database engine options
When planning effective data management strategies, it’s crucial to understand the built-in tools and features provided by database engines. These options play a big role in making backup processes efficient and making sure data storage solutions are resilient. Let’s explore how these features contribute to building a strong foundation for long-term data backup, especially in the AWS Cloud.
In the following sections, we take a deep dive into each of the long-term backup option as summarized in the following table. We explore the intricacies of automated backups, manual snapshots, and discuss their strengths, limitations, and best practices for implementation.
Database Service | Automated Backups, Manual Snapshots, and AWS Backup Support | Engine Built-In Options | Other Options |
Amazon RDS for Oracle | YES | Oracle Data Pump, RMAN | AWS DMS and other tools such as Oracle GoldenGate, Commvault, and Druva |
Amazon RDS for SQL Server | YES | Native SQL backup and restore | AWS DMS |
Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL-Compatible Edition | YES | pg_dump and pg_restore and pgLogical | Snapshot export to Amazon S3, AWS DMS, and third-party tools. |
Amazon RDS for MySQL, Amazon Aurora MySQL-Compatible Edition, and Amazon RDS for MariaDB | YES | mysqldump/mysqlpump MySQL Shell |
Snapshot export to Amazon S3, AWS DMS, and third-party backup tools like mydumper and myloader |
Amazon RDS for Db2 | YES | Db2 backup | AWS DMS |
Automated backups
Automated backup is a feature of Amazon RDS that you can use to automatically take snapshots of your database instances. Automated backups retention isn’t available beyond 35 days, so to address requirements beyond 35 days, you can either take manual backups or take snapshot backups, which are explained in subsequent sections. This option is not typically considered for long term retentions.
Manual snapshots
Manual snapshot allows you to create manual snapshots of RDS databases. These snapshots can be used to create new volumes or instances, or to restore data in case of data loss. Manual snapshots are stored on Amazon Simple Storage Service (Amazon S3) and can be copied to other Regions. Manual snapshot backup can be useful for backing up data that has changed since the last scheduled backup or for creating a snapshot of a specific point in time.
In addition to continuous backup in Amazon RDS, there is the option of snapshot-based backups, which are offered through solutions like AWS Backup. AWS Backup comes with the features Vault Lock and logically air-gapped vault, which can be particularly beneficial in meeting long-term data retention and compliance requirements. By creating discrete, point-in-time recovery points, organizations can maintain a comprehensive historical record of their data. This allows them to satisfy regulatory obligations around data preservation and accessibility, because these snapshot backups can be retained for extended periods to provide the necessary audit trail. Overall, the snapshot backup approach facilitated by services like AWS Backup enables organizations to effectively fulfil their long-term data retention and regulatory requirements.
AWS DMS
AWS Database Migration Service (AWS DMS) can migrate data from various source data stores, including Oracle, SQL Server, MySQL, PostgreSQL, Amazon RDS, and Aurora to Amazon S3 in a secure and efficient manner. The data is exported in a raw CSV format, which can be consumed by other applications or analytics tools. Archiving the data in S3 Glacier provides a cost-effective and durable storage solution for long-term data retention, and you can use AWS DMS to migrate infrequently accessed data.
Snapshot export to Amazon S3
Amazon RDS provides the snapshot export to Amazon S3 feature, which allows you to export database snapshots from various Amazon RDS engines (such as PostgreSQL, MySQL, and MariaDB) to an S3 bucket in Apache Parquet format. For supported DB engines in Amazon RDS, refer to Supported Regions and DB engines for exporting snapshots to S3 in Amazon RDS, and for Aurora DB engines, refer to Supported Regions and Aurora DB engines for exporting snapshot data to Amazon S3. This exported data can then be analyzed using services such as Amazon Athena, Amazon Redshift Spectrum, or other big data processing frameworks.
Amazon RDS and Aurora integration with Amazon S3
Aurora provides integration with Amazon S3, allowing you to load data directly from Amazon S3 into Aurora database tables using SQL commands such as LOAD DATA FROM S3. Additionally, you can export an entire Aurora database cluster directly to Amazon S3 without creating a snapshot, enabling efficient data analysis and processing. It is also supported in Amazon RDS PostgreSQL.
This feature is supported with Aurora MySQL-Compatible using the SELECT INTO OUTFILE S3 statement. It’s supported with Aurora PostgreSQL-Compatible using the aws_s3 extension.
Database engine built-in and other options
In this section, we explore some engine-specific options for Amazon RDS and Aurora, highlighting features and best practices tailored to MySQL, PostgreSQL, Oracle, DB2, and SQL Server. These insights will help you fine-tune your backup strategy to use the full potential of your chosen database engine.
Amazon RDS for Oracle
The following are the options for long-term backups of Amazon RDS for Oracle. These options differ based on the backup output formats; Data Pump creates logical backups, whereas RMAN creates physical formats of Oracle data or schema.
Data Pump
Oracle Data Pump emerges as a reliable and versatile choice for organizations seeking a native option to run long-term data backups in the AWS Cloud. Its familiarity within the Oracle community, adaptability to different backup scopes, suitability for varying database sizes, and the ability to optimize performance through parallel options and tuning parameters position it as a formidable tool in the arsenal of data management strategies. It’s recommended that you follow the best practices for large migrations.
To use Data Pump in an Amazon RDS for Oracle environment, follow these steps:
- To export the data from your source Oracle database, use the expdp (Data Pump export) command, similar to the following example:
This command exports the entire database to a file named mydatabase.dmp in the mydumpdir directory.
- Upload the dump file to an S3 bucket. You can use the AWS Command Line Interface (AWS CLI) or the Amazon S3 console to do this.
- Import the data into your RDS for Oracle instance using the impdp (Data Pump import) command:
Note: Most of “limitations and restrictions” as specified in previous section of “Aurora integration with S3” also applies to RDS Oracle engine.
RMAN
Oracle Recovery Manager (RMAN) is a tool designed for managing and automating backup and recovery processes within the Oracle Database ecosystem. Using RMAN on Amazon RDS for Oracle is straightforward. Due to the fully managed nature of Amazon RDS for Oracle, you can’t directly restore an RMAN backup into these environments. Instead, you need to restore the RMAN backup to an on-premises or self-managed Oracle Database instance, and then use other methods such as the database migration tools provided by AWS to migrate the restored data into your Amazon RDS environment.
The following is an example of how you can use RMAN to perform a full database backup:
- Connect to your RDS for Oracle instance using your preferred SQL client, such as SQL Developer or the Amazon RDS console.
- Start the RMAN command line interface:
- Use the following command to create a full backup of your entire database, including data files, control files, and archived redo logs:
- Monitor the backup progress:
This command displays the status of the backup, including the files being backed up and the overall progress.
For details on how to automate the backup, refer to Automate Amazon RDS backups using the Oracle RMAN utility and upload backup files to Amazon S3.
Amazon RDS for SQL Server
Amazon RDS for SQL Server offers a command line utility called rds_backup_database, which you can use to perform manual backups of your databases. This utility provides a basic level of backup functionality, allowing you to create full, differential, or transactional log backups of your RDS for SQL Server databases. For more details, see Backup and Restore Strategies for Amazon RDS for SQL Server.
To create a backup, use the following command:
Although the rds_backup_database
utility can be a useful tool for creating on-demand backups, it’s important to note that the long-term restorability of these backups is not guaranteed, especially in the face of potential future changes to the underlying SQL Server engine version or other database technologies.
Relying solely on the rds_backup_database
utility might not be sufficient to meet your organization’s data protection and compliance requirements, particularly if you need to maintain the recoverability of your data over an extended period, such as 10 years or more. The backup files created by this utility are stored in Amazon S3, but their compatibility with future versions of SQL Server is not assured.
To maintain the long-term restorability of your RDS for SQL Server database backups, we recommend considering a more comprehensive backup and recovery strategy. This could involve:
- Implementing a backup solution that uses native SQL Server backup functionality, such as SQL Server Management Studio (SSMS) or PowerShell scripts. These backups are more likely to be compatible with future SQL Server versions.
- Regularly testing your backup and restore processes to validate the integrity and recoverability of your data.
- Maintaining a backup retention policy that aligns with your organization’s data retention requirements and regulatory obligations.
By adopting a more robust backup strategy, you can have greater confidence in your ability to restore your RDS for SQL Server databases, even in the case of long-term changes to the underlying database technology.
Amazon RDS for PostgreSQL and Aurora PostgreSQL-Compatible
The open-source nature of PostgreSQL allows others options as well. The following are the standard long-term backup options that are in practice across industries.
pg_dump and pg_restore
pg_dump and pg_restore are tools for backing up and restoring a PostgreSQL database. It’s a command line utility that you can use to create a backup of your database, which can be used to restore the database in case of data loss or disaster.
To create a backup, use the following command:
Provide the host name or IP address of your PostgreSQL server, the user name of the PostgreSQL server, the port number of your PostgreSQL server, the path to the directory where you want to save the backup file, and the name of your database.
backup_file_path
could be on your local computer, an Amazon Elastic Compute Cloud (Amazon EC2) instance, or an on-premises server, depending on where you’re running the command from.
After creating the backup, you can use the following command to restore the database:
Amazon RDS for MySQL, Amazon RDS for Maria DB, and Aurora MySQL-Compatible
MySQL and MariaDB engines consist of the following options, which can be used based on your use case or customer requirements.
mysqldump
mysqldump is a built-in database backup tool for MySQL. This tool creates logical backups and is part of the MySQL client programs. mysqldump supports single-threaded processing.
To use mysqldump
to back up your RDS for MySQL, RDS for MariaDB, or Aurora MySQL database, run the following command:
Provide the host name or IP address of your server, your database user credentials, and the name of your database. The backup file, backup.sql, will be saved in the current directory. This could be on your local computer, an EC2 instance, or an on-premises server, depending on where you’re running the command from.
The following command will restore the entire database, including the schema and data, from the backup file:
mydumper and myloader
mydumper exports a consistent backup of MySQL databases. It supports backing up the database by using multiple parallel threads, up to one thread per available CPU core.
To use mydumper and myloader, follow these steps:
- Install mydumper and myloader on your local machine or on an EC2 instance in the same Region as your RDS for MySQL or Aurora MySQL database.
- Make sure you have the necessary credentials to access your RDS for MySQL or Aurora MySQL database.
- Use mydumper to create a full backup of your RDS MySQL or Aurora MySQL database in the specified output directory:
Provide your instance endpoint, your user credentials, and the backup file path.
- Use myloader to restore the backup data to your RDS for MySQL, RDS for MariaDB, or Aurora MySQL database:
MySQL Shell
MySQL Shell is an advanced client and code editor for MySQL. In addition to the provided SQL functionality, similar to MySQL, MySQL Shell provides scripting capabilities for JavaScript and Python and includes APIs for working with MySQL.
The MySQL Shell provides several options for creating backups of your database. Let’s take a look at each one:
- util.dumpTables() – The util.dumpTables() function allows you to create a backup of specific tables within your database. In the following example, we create a backup of the table1 and table2 tables from the my_database database and save the backup files to the /path/to/backup/directory directory:
- util.dumpSchemas() – You can use the util.dumpSchemas() function to create a backup of entire schemas (databases) within your instance. In the following example, we create a backup of the my_database and another_database schemas, and save the backup files to the /path/to/backup/directory directory:
- util.dumpInstance() – You can use the util.dumpInstance() function to create a backup of your entire database instance. This is useful if you need to create a full backup of your database, including schemas and tables. In the following example, we create a backup of the entire database instance and save the backup files to the /path/to/backup/directory directory:
Amazon RDS for Db2
The BACKUP DATABASE command in Db2 allows you to create a backup copy of your database or specific tablespaces. Complete the following steps to use BACKUP DATABASE
to back up your RDS for Db2 database:
-
- Open the Amazon RDS console
- Select your Db2 instance and note the endpoint and port number
- Open a terminal or command prompt on your local machine
- Install the AWS CLI tool if you haven’t already
- Configure the AWS CLI tool with your AWS credentials.
- Use the following command to connect to your Db2 instance using the Amazon RDS for Db2 CLI (provide the appropriate values for your Db2 instance):
- After connecting to your DB2 instance, use the following command to initiate a full backup:
Provide the name of your database and the desired location for the backup file. This could be a local directory on your machine or an S3 bucket.
The backup process will begin, and you can monitor the progress in the terminal.
Conclusion
In the post, we explored the rich landscape of long-term data backup for RDS databases, showcasing the diverse range of options available to organizations. From the built-in AWS solutions such as automated backups and snapshot exports to Amazon S3, to the more advanced strategies involving AWS Backup, we highlighted the powerful and flexible tools at your disposal.
If you have any feedback or questions, leave them in the comments section.
About the Authors
Babaiah Valluru is working as Lead Consultant – Databases in the Professional Services team at AWS. He specializes in database migrations, helping customers with both homogeneous and heterogeneous database migrations, as well as data analytical services. In addition to his expertise in database technologies, Babaiah is passionate about exploring and learning AI and ML services. With a deep understanding of open-source databases like MySQL, PostgreSQL, and MongoDB, Babaiah uses this knowledge to guide AWS customers through complex cloud transformation projects. In his free time, Babaiah enjoys traveling and spending time with his family and friends.
Shailesh K Mishra serves as a Senior Solutions Architect in the Financial Services Industry (FSI) team at AWS. He specializes in database and analytics technologies and migration strategies. Using his expertise, Shailesh guides customers in their AWS Cloud adoption journey, crafting robust solutions aligned with AWS Well-Architected best practices. As a generative AI enthusiast, he also empowers clients to harness the potential of this cutting-edge technology.
Source: Read More