Data migration from Oracle Database to Amazon Relational Database Service (Amazon RDS) for PostgreSQL or Amazon Aurora PostgreSQL-Compatible Edition can require updates to the database schema or SQL commands used to access the data. To assist migrations, AWS provides the standalone AWS Schema Conversion Tool (AWS SCT) or DMS Schema Conversion in the AWS Management Console to help you convert existing database schema from one database engine to another and AWS Database Migration Service (AWS DMS) to help migrate your data to AWS securely with minimal downtime.
Although the tooling helps automate the migration effort, there are scenarios where a developer needs to intervene and manually convert SQL from Oracle to PostgreSQL. In this post, we discuss configurations in AWS DMS endpoints and AWS DMS tasks to migrate spatial columns from Oracle to Aurora PostgreSQL-Compatible efficiently.
Spatial data overview
Spatial data, also known as geospatial data, refers to information that identifies the geographic location and characteristics of natural or constructed features and boundaries on the Earth.
This data is often represented in terms of Cartesian coordinates (x,y) for two-dimensional maps, but might also include altitude (z) for a three-dimensional representation. Spatial data can come in various forms, including points (such as GPS coordinates), lines (such as roads or rivers), and polygons (such as borders and land use zones).
The concept of spatial data is integral to a variety of applications that require an understanding of how different elements relate to each other within a geographical space.
Oracle spatial data
Oracle Spatial and Graph is intended to simplify spatial data management and make it more intuitive for users of location-aware applications and geographic information system (GIS) software. Upon storage of spatial data within an Oracle database, said data can subsequently be manipulated, retrieved, and associated with all other data resident in the database. Oracle Spatial implements an object-relational model to represent geometries, storing entire geometries in the native Oracle spatial data type SDO_GEOMETRY. Oracle Spatial uses a two-level query model to resolve spatial queries and spatial joins. Integration of spatial indexing functions directly into the Oracle Database engine constitutes a pivotal capability of Oracle Spatial.
Aurora PostgreSQL spatial data
Aurora PostgreSQL-Compatible offers PostGIS extensions for working with geospatial data that allow you to treat that data as first-class objects in your database. For more information about supported geometric types, see Geometric Types.
The Oracle spatial data type SDO_GEOMETRY
gets converted as data type GEOMETRY
. For more details, refer to Geometries.
Solution overview
In this post, we discuss the setup required on the source Oracle database and target Aurora PostgreSQL database for migrating data with spatial columns. We cover the configurations required for the source and target AWS DMS endpoints, as well as the task settings required for a serverless AWS DMS replication task to migrate an Oracle table containing an SDO_GEOMETRIC
column to an Aurora PostgreSQL table.
The following diagram illustrates solution architecture.
Prerequisites
To get started, you must have the following prerequisites:
- An active AWS account.
- An Oracle instance (for this post, we use Amazon RDS for Oracle) or an on-premises Oracle database with tables having spatial columns.
- An Amazon Aurora PostgreSQL Serverless or RDS for PostgreSQL database (for this post, we use Aurora PostgreSQL Serverless). If you don’t already have an Aurora PostgreSQL Serverless instance, you can create one. For instructions, refer to Creating a DB cluster that uses Aurora Serverless v2.
- AWS SCT installed on your local machine or Amazon Elastic Compute Cloud (Amazon EC2) instance to convert and apply the schema to the target Aurora PostgreSQL database. For more information, see Installing and Configuring AWS Schema Conversion Tool. You can also use the fully managed AWS DMS Schema Conversion service directly in the AWS Management Console without downloading and executing AWS SCT (for this post, we use AWS SCT standalone tool).
Build sample tables on the source
For this post, we use an RDS for Oracle database and create a sample table called STORES
using the following DDL from the Oracle documentation:
Later in this post, we provide you a script to expand your dataset. The script is designed to add 100 new entries to the stores
table. You have the flexibility to adjust the number of rows inserted based on your specific requirements or use case.
Set up your source Oracle database
In this section, we discuss two settings that are needed to make sure the spatial columns are migrated successfully to the target:
- For migrating Oracle databases version 12.1 or earlier to PostgreSQL using AWS DMS, a function is required to convert the Oracle
SDO_GEOMETRY
spatial data type to GeoJSON format. By default, AWS DMS calls a customSDO2GEOJSON
function, which must be present and accessible to the AWS DMS user—this is mandatory. ASDO2GEOJSON
function can be found on some third-party websites, but it might not work for all spatial data and AWS DMS does not maintain these custom functions. It is advised that a custom function is created according to your business requirements. Starting in Oracle 12.2, built-in GeoJSON support was added, eliminating the need for a custom function. - Because AWS DMS uses
SDO2GEOJSON
to normalize spatial columns in GEOJSON format, this operation is going to happen at the source in an Oracle temporary tablespace. Make sure the temporary tablespace at the source is appropriately sized. For Amazon RDS for Oracle specific information, refer to How do I resize the tablespace for my Amazon RDS for Oracle DB instance?
Set up your target Aurora PostgreSQL database
When using Aurora Serverless as target, scaling ACUs up or down during the full load phase of an AWS DMS task requires a complete reload of all tables that are currently being loaded. To avoid this, make sure that your Aurora Serverless cluster has sufficient capacity units to prevent auto scaling during the full load phase of the AWS DMS replication task as highlighted on the AWD DMS console. We have set the minimum capacity unit to 64. After the full load is complete, you can reset the minimum Aurora capacity as needed.
This configuration is applicable only when you are using Aurora Serverless as target.
Now you’re ready to create the PostGIS extension on the target. The PostGIS extension includes several related extensions that work together to provide geospatial functionality. For this post, we create just the basic extension. Depending on your use case, you might need more extensions. For more details, see Managing spatial data with the PostGIS extension. Creating the extension can be achieved in either of the two ways: manually or using AWS SCT.
To manually create the extension on the target, use the following statement:
Alternatively, you can allow AWS SCT to connect to database servers to automatically convert tables with SDO_GEOMETRY
objects into PostGIS geometry types (explained in the next section). When using AWS SCT, it can automatically create the necessary extension pack when you apply the target database script by choosing Apply to database from the context menu. AWS SCT applies the extension pack before it applies all other schema objects. This way, you already have the add-on module that emulates functions present in a source database that are required when converting objects to the target database. Refer to Using extension packs with AWS Schema Conversion Tool for more information.
Validate the extension was created using the following command:
Prepare the table for migration using AWS SCT
In this post, we used AWS SCT to prepare the table structure at the target. You can manually create the table structure at the target as well.
The following source DDL is supplied to AWS SCT:
This is converted to the following target table by AWS SCT by default:
Notice how the geometry column gets automatically converted to a PostgreSQL-compatible data type.
For this post, we mapped the store_id
column’s Oracle data type NUMBER to PostgreSQL data type INTEGER in order to keep the right precision at the target because by default AWS SCT changed it to NUMERIC data type. To do so, we add a transformation rule under the conversion settings of AWS SCT. To perform accurate data type mapping for the Oracle NUMBER column to the ideal data type column in PostgreSQL, refer to Convert the NUMBER data type from Oracle to PostgreSQL.
Now the final DDL for the stores
table looks like the following code:
If the table isn’t present in the target, AWS DMS creates the target table, irrespective of the table preparation mode selected in AWS DMS task. As a prerequisite, make sure to use CREATE EXTENSION
statements to load the PostGIS extensions before you start the AWS DMS task as mentioned earlier.
Migrate spatial column data using AWS DMS
In this section, we walk through the steps to migrate your spatial column data.
Create the AWS DMS endpoints
Create AWS DMS endpoints for the source and target database. AWS DMS endpoints provide the connection, data store type, and location information about your data store.
Create your source endpoint
For instructions to create your Oracle source endpoint, refer to Using an Oracle database as a source for AWS DMS.
Apart from default connection configurations, additional endpoint setting needs to be added if you’re running Oracle version 12.1 and earlier. For all new versions of Oracle, there is no need for this extra connection attribute:
"SpatialDataOptionToGeoJsonFunctionName": "sdo2geojson"
Create the target endpoint
For instructions to create your PostgreSQL target endpoint, refer to Using a PostgreSQL database as a target for AWS Database Migration Service.
Apart from default connection configurations, additional endpoint settings can be added for efficient AWS DMS processing:
- “MaxFileSize” – Specifies the maximum size (in KB) of any CSV file used to transfer data to PostgreSQL. This is not mandatory but will help you increase the performance of the full load. You can go in increments of 64, 128, 256, to a maximum of 1 GB. The default value is 32,768 KB (32 MB).
- “AfterConnectScript= SET session_replication_role = replica” – This attribute has AWS DMS bypass foreign keys and user triggers to reduce the time it takes to bulk load data.
Create the AWS DMS Serverless migration task
AWS DMS Serverless is a feature that provides automatic provisioning, scaling, built-in high availability, and a pay-for-use billing model to increase operation agility and optimize your costs. The serverless feature eliminates replication instance management tasks like capacity estimation, provisioning, cost optimization, and managing replication engine versions and patching.
For instructions to create your migration task, refer to Creating a serverless replication. Make sure to review AWS DMS Serverless limitations.
For migrating spatial geometry data, AWS DMS uses full LOB mode, which migrates all LOB data in your tables, regardless of size. Full LOB mode provides the convenience of moving all LOB data in tables, but the process can have a significant impact on performance.
AWS DMS migrates LOB data in two phases:
- AWS DMS creates a new row in the target table and populates the row with all data except the associated LOB value
- AWS DMS updates the row in the target table with the LOB data
This makes it mandatory for the source table to have a primary key or unique key. If the table doesn’t have a primary key or unique key, data will not be migrated. If necessary, alter the table at the source to add the primary key constraint.
Additionally, all geometry columns on the target table must be nullable, because AWS DMS treats geometry columns as LOB. Alter the table definition at the target to drop NOT NULL constraints for the geometry columns if present. When data migration is complete, revert the changes made to the target columns using the alter command.
Use the following settings for the replication task:
- Set replication type as Full load.
- Set target table preparation mode as Do nothing.
- Enable Amazon CloudWatch Logs under the task settings so you can debug issues.
- For this example, we set Max DMS capacity units (DCU) as 32 DCU (v8 CPU and 64 GB). Minimum DCU was left blank.
- The following JSON shows an example of the mapping rules for the DMS task. Replace the schema name with the name under which the table is created.
Use table parallelism settings for loading large volumes of data in an efficient manner. For more information about the various options, refer to AWS DMS best practices for moving large tables with table parallelism settings.
Run the task. When the task is complete, you will see 100 rows inserted for the stores
table in the database.
Verify the table is loaded in the target database as well.
Troubleshooting common errors
If the PostGIS extension isn’t installed, you will encounter the following error when you run the AWS DMS task:
[TARGET_LOAD ]E: RetCode: SQL_ERROR SqlState: 42704 NativeError: 1 Message: ERROR: type "geometry" does not exist;
Error while executing the query [1022502] (ar_odbc_stmt.c:5007)
[TARGET_LOAD ]E: execute create table statement failed, statement
CREATE TABLE "xxxxx"."stores" ( "store_id" DECIMAL(38,10) NOT NULL, "description" VARCHAR(100), "street_address" VARCHAR(40), "city" VARCHAR(30), "state_province_code" VARCHAR(2),
"postal_code" VARCHAR(9), "store_geo_location" geometry )
[1022502] (odbc_endpoint_imp.c:4694)
After the PostGIS extension is installed, this error will be resolved.
If your source table’s spatial columns are defined as NOT NULL and you’re using AWS DMS to create the table, the AWS DMS task will fail and data will not get loaded. This is because the spatial columns need to allow the NULL value. In this case, drop the NOT NULL constraint on the source table before you start the AWS DMS task for data migration:
Alternatively, you can manually create the target table structure with spatial columns defined as nullable so that AWS DMS doesn’t create the table when you start the AWS DMS task for data migration.
Reference script for test data generation
You can use the following PLSQL script to generate 100 rows of dummy data for testing. Update array FOR i IN 1..100 LOOP
to generate more records if required.
Clean up
To avoid unnecessary charges, clean up any resources that you built as part of this architecture that are no longer in use.
- Delete AWS DMS Serverless replications along with source and target endpoint from the AWS DMS console.
Additionally, after the migration is complete, if you don’t need to retain the source and target databases or tables, you can drop them as well.
Conclusion
In this post, we showed how you can migrate Oracle SDO_GEOMETRY
spatial columns to Aurora PostgreSQL Serverless using AWS DMS. You can also apply this solution when migrating to other databases that support geometric data types. The AWS SCT can help create the target metadata, and AWS DMS handles continuous data migration and replication to minimize downtime.
If you have any questions or suggestions about these migration techniques, leave your feedback in the comments.
About the Authors
Sandeep Gunjotikar is a Data Architect with Amazon Web Services(AWS) Professional Services. At AWS, his role is to build a strong partnership with customers to strategize, modernize, and implement data architectures (Data Lake, Lake House, Data Mesh, etc) & new distributed computing solutions such as Data and Business intelligence, machine Learning and batch/real-time data processing, that often span private cloud and public cloud services.
Vanshika Nigam is a Solutions Architect on the Database Migration Accelerator team at Amazon Web Services. With over 6 years of experience working with Amazon RDS and AWS DMS she serves as an Amazon DMA Advisor helping AWS customers accelerate the migrations of their on-premises, commercial databases to AWS Cloud database solutions.
Source: Read More