In Part 1 of this series we discussed dealing with multiple database engines or schemas that share data with each other, you’re faced with the difficulty of establishing cross-engine database connectivity. To overcome this challenge, PostgreSQL offers foreign data wrappers (FDW), which are libraries that facilitate communication with external data sources by abstracting the complexities of data source connection and data retrieval.
The PostgreSQL mysql_fdw
extension is a FDW implementation for MySQL, facilitating direct connectivity between a PostgreSQL database and a MySQL or a MariaDB database. The mysql_fdw extension is supported for Amazon Aurora PostgreSQL-Compatible Edition version 15.4, 14.9, 13.12, 12.16 and higher releases and Amazon RDS for PostgreSQL version 14.2, 13.6 and higher releases.
In this post, we focus on working with the features of mysql_fdw PostgreSQL extension on Amazon RDS for PostgreSQL to help manage a large set of data that on an external database scenarios. It enables you to interact with your MySQL database for importing individual/large/selectively number of objects at the schema level and simplifying how we get information about the MySQL/MariaDB schema, to make it easier to ultimately read/write data. We will also provide an introduction to understand query performance on foreign tables.
The examples in this post also work for Amazon Aurora PostgreSQL-compatible edition.
Prerequisites
To work with features discussed in this post, you must create the mysql_fdw
extension in a PostgreSQL database by following the instructions provided in Part 1 of this series.
You also need the following resources:
- MySQL instance – This can be a self-managed instance or an Amazon Relational Database Service (Amazon RDS) for MySQL
- PostgreSQL instance – You can use an Amazon Aurora PostgreSQL-Compatible Edition or Amazon RDS for PostgreSQL database with the mysql_fdw extension For this post, we use Amazon RDS for PostgreSQL.
- Security group – Configure your security group to allow traffic between your source and the target databases.
Bulk loading foreign tables
In Part 1, we learned a simple way of creating a foreign table by running the following DDL template:
The following example creates a foreign table named ft_mysql_offices
in PostgreSQL using the MySQL source table mysql_offices
:
Creating foreign tables individually can be challenging due to the need to map data types and column collation matching between the target. Failure to ensure this compatibility may lead to semantic anomalies, as the remote server might interpret query conditions differently from the local server, resulting in unexpected or incorrect results. mysql_fdw
provides a mechanism to import all or a subset of tables of a schema into PostgreSQL with the IMPORT FOREIGN SCHEMA command. The following examples discuss bulk import of objects:
- Import objects of an entire schema – To import an entire schema from the remote server with hundreds of objects, you use the
mysql_fdw
featureIMPORT FOREIGN SCHEMA
. The following example imports all the tables in thehrdb
schema of MySQL into thepublic
schema in PostgreSQL:
- Selectively import tables in a schema – To import a subset of tables of a schema from the remote server to the local database,
mysql_fdw
supports the selective import of specific tables into the local database. The following example imports the tablesemp
anddept
in thehrdb
schema of MySQL into thepublic
schema of the PostgreSQL database:
- Import a table into multiple target schemas – You can import objects into multiple target schemas on the PostgreSQL database end. Importing tables into multiple schemas may help restrict user access to different PostgreSQL schemas. The following example creates the foreign tables
emp
anddept
in two different schemas,payroll
andfinance
, in a PostgreSQL database by using theIMPORT FOREIGN SCHEMA
command:
PostgreSQL extensions are database specific. If you have more than one database inside a PostgreSQL instance or cluster, you must create a separate mysql_fdw
extension, foreign server definition, and user mapping.
Create foreign tables with a subset of columns
A common requirement is to create a foreign table with a subset of columns of a table with a large number of columns from a remote database, as this allows you to access specific column data from remote MySQL/MariaDB sources without importing the entire table. This approach is useful for focusing on relevant information and potentially improving query performance by having only required column data. In such cases, you can create foreign tables with only the required columns. For example, the following table has eight columns on the MySQL end:
Let’s assume that you don’t need all of these fields in your PostgreSQL database and its application. In such cases, you can create a foreign table with only the necessary columns on the PostgreSQL side:
The following query on a PostgreSQL foreign table retrieves records from the target MySql table, selecting only the three columns that exist locally within the PostgreSQL database.
Create foreign tables on top of views
For data aggregation purposes or for performance reasons, you can create views in an RDS for MySQL database. The mysql_fdw
extension supports creating foreign tables on top of such views:
The following code snippet creates a view on a table films with subset of columns when run on MySQL database.
The view created in MySQL database can be used as source to create a foreign table in target PostgreSQL database. The following example creates the foreign table v_film
in the PostgreSQL database when executed in target PostgreSQL database system:
Create views on top of foreign tables
Views allow you to transform or reshape the data from the foreign tables without modifying the underlying data. You can perform calculations, combine data from multiple foreign tables, or apply complex logic within the view definition. You can also create views on top of foreign tables in the PostgreSQL database. The foreign table can even be a view on the MySQL end. The following example creates a view using the newly created foreign table v_film
:
Writing data to a remote MySQL / MariaDB database
After you create the foreign tables in the PostgreSQL database , you can start using the foreign tables as local tables and perform Data Modification Language (DML) operations. This enables you to perform cross-database operations and data integration between PostgreSQL and MariaDB/MySQL databases. Below are few of the use cases:
- Using a single interface to manage data residing stored in both PostgreSQL and MySQL
- Easier migration from MySQL/MariaDB to PostgreSQL
- Synchronizing data without complex ETL processes
- Near real-time updates on MySQL data from PostgreSQL applications
- Cross-database queries and analytics
For example, here’s a set of queries that demonstrates how we can update data in a MySQL table using the capabilities of the mysql_fdw
extension in PostgreSQL:
Writing to a foreign table (INSERT, UPDATE, or DELETE) requires the first column of the table to be unique. Otherwise the following error, is reported:
To address this, add a primary key or unique key on the first column of the table in the MySQL database.
Materialize foreign tables
Materializing foreign tables is a technique used to improve query performance and reduce network overhead. One of the key features that mysql_fdw
is the ability to support persistent connections to remove MariaDB/MySQL databases. After a query runs, the connection to the remote RDS for MySQL database is not dropped. Instead, it retains the connection for the next query from the same session. If you need to access data from the foreign tables frequently, avoid complex queries running on the remote server, and to reduce network overhead, you can create local materialized views:
Currently, materialized views in PostgreSQL aren’t updated when data in the base tables change. To store updates in your local materialized view, you will need to run a refresh command explicitly:
Understanding query performance when executing a remote MySQL query
You can use the PostgreSQL EXPLAIN
and EXPLAIN ANALYZE
commands to review your query’s performance. However, troubleshooting the performance of queries on foreign tables is challenging because queries run in the remote source system unless the data is materialized in the local database.
The following query plan shows the scan sent by the RDS for PostgreSQL instance to query data from the MySQL database. The keyword Foreign Scan
indicates that a foreign table is involved in the query. After the source is identified, you must check the remote server for the query plan.
The PostgreSQL EXPLAIN ANALYZE
plan shows only the query sent to the RDS for MySQL database with a foreign scan and a remote query. If the cost of the foreign scan is high, monitor the remote RDS for MySQL database resource utilization metrics including CloudWatch metrics, Performance Insights, check your RDS events that would affect the performance. Work with query optimization techniques on the remote MySQL instance along with improving query performance for RDS for MySQL with RDS optimized reads.
Bulk cleanup of foreign objects
The foreign data wrapper framework has the ability to clean up foreign objects with a single command using the CASCADE
feature. This is useful when you set up foreign objects for a temporary purpose, like data migration. PostgreSQL will walk through the dependencies and drop all foreign objects when you run the DROP EXTENSION
command with the CASCADE
keyword.
The following example drops the extension mysql_fdw
from the PostgreSQL database. Dropping the extension also removes the user mappings and the foreign objects that depend on the extension.
Clean up
The resources used in this solution will incur costs as long as they are running. You can perform a cleanup to free up these resources:
- Drop the foreign tables, views, and materialized views using the following commands:
- Drop the schemas created for testing. The following command deletes the
finance
andpayroll
schemas:
- Drop the user mappings and foreign server:
- Delete the RDS for PostgreSQL instance if you created one for this post.
- Stop your EC2 instance for MySQL if you created one for this post.
If you want to continue using this test environment, it’s recommended to create a billing alarm to help you stay within your budget.
Conclusion
PostgreSQL foreign data wrappers present a solution to connect to and query MySQL databases as if they were local tables from PostgreSQL databases that offers data integration, real-time data access, reduced data redundancy, leveraging PostgreSQL features between cross-databases. This also helpful in database migrations that support gradual or a phased migration offering coexisting data in both the environments for testing, data comparison, validation results in downtime for data migrations. In this post, we demonstrated different ways to create and manage foreign tables, understand the performance of queries involving foreign tables, and materialize the foreign tables for performance reasons in an RDS for PostgreSQL database.
Leave a comment if you have questions or feedback.
About the authors
Rohit Suraj Kapoor is a Lead Consultant – Database with AWS Professional Services in India, helping customers have a successful cloud adoption and migration journey. Throughout his career, Rohit has been dedicated to assisting customers with database-related challenges, including migration, design, and performance optimization.
P G Kumar Babu is a Senior Consultant with AWS ProServe, Global Competency Centre. His background spans a wide depth and breadth of expertise and experience in SQL and NoSQL database technologies. He is a Database Migration Expert and has developed many successful database solutions addressing challenging business requirements for moving databases from on premises to Amazon RDS and Amazon Aurora PostgreSQL and MySQL.
Source: Read More