Customers running Oracle workloads choose Amazon Relational Database Service (Amazon RDS) Custom for Oracle to access their database environment and operating system, while maintaining the administrative ease, durability, and scalability of a managed service. Access through RDS Custom for Oracle enables the customizations that your workloads demand. Application Continuity is one such customization that you can implement on RDS Custom for Oracle. Application Continuity helps critical workloads such as banking applications with strict Recovery Point Objective (RPO) and Recovery Time Objective (RTO) meet their availability requirements.
In this post, we show you how to implement Application Continuity in an RDS Custom for Oracle environment using a sample application. We also show you how to test the implementation to see that, when an outage occurs at the database tier, the application recovers and resumes without any data loss—automatically and transparently—along with the database failover. Finally, we show you how to verify the results before cleaning up the environment.
What is Application Continuity?
Application Continuity (AC) is a feature that enables replay of a request against an Oracle database after a recoverable error that makes the database session unavailable. As a result, the outage appears to the user as no more than a delay in the request. Client requests can contain transactional and non-transactional work. After a successful replay, the application can continue where that database session left off. With AC, the user experience is improved by masking many outages—planned and unplanned—without the application developer needing to attempt to recover the request. Oracle Active Data Guard supports AC, and RDS Custom for Oracle lets you configure Oracle Active Data Guard. See Ensuring Application Continuity and Application Continity for Java to learn more about AC. On the client side, AC supports thin Java-based applications, OCI and ODP.NET based applications with support for open source drivers, such as Node.js and Python. See the Application Checklist for Continuous Service for MAA Solutions to know more about configuring clients.
Solution overview
For this solution, we have two RDS Custom for Oracle instances, one configured as the primary and the other as the physical standby, in an Oracle Active Data Guard configuration. The instances are running in two different Availability Zones within the same AWS Region to provide high availability. The observer instance is running in a third Availability Zone. The observer instance also acts as the database client, and hosts the sample Java application, so we don’t need another Amazon Elastic Compute Cloud (Amazon EC2) instance. Also, we open the standard ports for Oracle Notification Service (ONS) to propogate database events only between the database on the database servers and the connection pool on the client host. Auto-ONS is turned on as a result, so no manual configuration of ONS is necessary.
There is no restriction on the placement of these instances. All of them could be in the same Availability Zone, or spread out in different Regions. It depends on the high availability and disaster recovery protection you are looking for and your constraints.
The solution has the following components:
The Active Data Guard configuration runs in the maximum availability mode
The Data Guard broker runs on all the three instances
The observer process runs on the observer instance
The observer instance also has Java JDK 11, along with the JAR files needed for AC (ojdbc11.jar, ucp11.jar, and ons.jar)
The following diagram illustrates the architecture.
To implement and test this solution, you complete the following high-level steps:
Configure an Oracle database service to support AC.
Create database triggers to make sure the AC service runs automatically and only on the primary DB instance.
Configure the database connect string to support AC.
Enable Fast Application Notification for all three instances.
Verify that the Data Guard configuration is up and running with Fast-Start Failover enabled and observer running.
Create necessary database objects for the test.
Create a sample Java application incorporating AC enablement.
Run the Java application, and as it progresses, initiate a database failover by crashing the primary database (using shutdown abort), interrupting the program’s progress.
Watch the log file of the observer process as the failover happens.
Watch the application as it resumes automatically after the failover without error.
Connect to the database and verify that the transactions are complete and error free.
Prerequisites
To follow along, you must have the following:
Two RDS Custom for Oracle instances in an Oracle Active Data Guard configuration and an EC2 instance with Oracle 19c client software to serve as the observer as well as the client. A Data Guard broker is running with Fast-Start Failover enabled, and the observer is started. For instructions, see Build high availability for Amazon RDS Custom for Oracle using read replicas. Alternatively, to build it using AWS CloudFormation templates, see Automate high availability setup in Amazon RDS Custom for Oracle. The preceding procedures help you create a standby in the same Region as the primary. To use different Regions, you need to build the standby manually. RDS Custom for Oracle is a bring your own license (BYOL) service. You should make sure you have Active Data Guard option along with an Oracle EE license. See Amazon RDS Custom pricing to learn about AWS charges.
Java JDK 11 installed on the EC2 instance created for the observer in the preceding step (the sample program given here works with Java JDK 17 as well). The code uses a thin JDBC driver and an Oracle Universal Connection Pool (UCP) to connect to the database.
JAR files ojdbc11.jar, ucp11.jar, and ons.jar on the observer EC2 instance (downloaded from Oracle Database JDBC driver and Companion Jars Downloads).
Port 6200 open for inbound traffic in the security groups of all three EC2 instances, allowing traffic only from those three instances. Auto-configured Oracle Notification Service (ONS) uses this port for intra-node communication. If you want to use a different port, you also have to start ONS explicitly and keep it running on all the three instances.
Create the database services
Connect to the primary database as an admin user through SQL*Plus, and using the DBMS_SERVICE package, create a database service that supports AC. We set the service name to DGACO in this example. The network name, the name of the service as used in SQLNet connect descriptors for client connections, can be different, but for ease of use, we use the same name. Although the values of the first six parameters are necessary to enable AC, you can adjust the numerical values of the following parameters as needed. See the following code:
Create a database trigger
From the same session, create a database trigger on the primary database and enable it. This trigger verifies that the AC service created in the preceding step runs automatically and only on the primary DB instance at a given time. See the following code:
Edit the tnsnames.ora file to define a connect string for the AC service
Use the following code to edit the tnsnames.ora file:
Enable Fast Application Notification and ONS
Fast Application Notification (FAN), the feature that enables the application to respond to the FAN events, needs the ONS ports to be opened on the database and the client hosts. Modify the inbound rules of the security groups of all three EC2 instances to allow port 6200.
There are no code changes to use FAN. FAN is automatically configured and enabled out of the box. You need to confirm appropriate settings on the client side. See the appendix of the FAN whitepaper for details. When the client connects, the Oracle database reads the URL or TNS connect string and automatically configures FAN at the client.
Verify a valid Data Guard configuration
If you have fulfilled the prerequisites discussed earlier, the Data Guard configuration should be valid and running. If necessary, use DGMGRL to restart the observer to capture the logs in a file:
The output should be similar to the following:
Create database objects for the sample application
Connect to the primary database as an admin user in SQL*Plus and run the following:
Connect to the primary database as the user ACDEMO in SQL*Plus and run the following:
Create a sample application
In the observer instance (which is also the database client in this case), create a Java program and save it as ACDemo.java using the following sample code. When compiled and run, the program connects to the database user ACDEMO and runs a nested loop of transactions. The outer loop inserts a parent transaction with a single INSERT statement and commits, and the inner loop step incorporates a series of INSERT statements without a commit in between, representing individual steps of the parent transaction referenced earlier. When testing, you will simulate a failure during the running of the inner loop, so you will have some INSERT statements that have run but not been committed at the time of the database failure.
The code also incorporates other requirements for enabling AC:
Draining – For planned maintenance, the recommended approach is to provide time for current work to complete before maintenance is started. You do this by draining work. Using an Oracle connection pool is one of the methods available for draining. You will use Oracle’s Universal Connection Pool (UCP) here. Always return the connections to the connection pool, and use connection tests (ValidateConnectionOnBorrow = true).
Fast Connection Failover (FCF) – The FCF feature is a FAN client implemented through the connection pool. Enable FCF (setFastConnectionFailoverEnabled = true).
Connect wait timeout – Set connect wait timeout (ConnectionWaitDuration = 3).
Auto-commit – Disable auto-commit (CONNECTION_PROPERTY_AUTOCOMMIT = false).
JDBC statement cache – Use the JDBC statement cache (CONNECTION_PROPERTY_IMPLICIT_STATEMENT_CACHE_SIZE = 100).
Create your application with the following code:
This is a sample program (a modified version of code from database-heartbeat.com), employing a minimalistic approach to demonstrate AC in its basic form. For other production uses, you should develop your own approach to employ all checks and features as dictated by your enterprise standards and business use case. Also, do not use hard-coded passwords in the code, and follow necessary policies while choosing the passwords. Finally, you should explore other associated features such as Fast Application Notification, Oracle Notification Service, Fast Connection Failover, Transaction Guard, Database Draining, and the extended feature of AC called Transparent Application Continuity (TAC) before proceeding further.
Compile the code using the following commands:
Run the application and invoke a primary database failure
From the same terminal (T1), run the application:
While the application is running, go to another terminal (T2) connected to the primary database as SYSDBA in SQL*Plus and run the following command:
The following screenshots show the T1 terminal (left) as the application waits, and the T2 terminal (right) of the forced database shutdown.
Monitor the database failover
In another terminal (T3), connected to the observer node, monitor the observer log file using the following command:
The failover starts after FastStartFailoverThreshold property value, which is set to 15 seconds here.
After a few seconds, the failover will be complete.
Watch the application resume and finish
Now that the failover has succeeded, go back to terminal T1. In the following screenshots, the application resumes and finishes (left) and the original primary database is still down (right).
Verify transactions
Now you can connect to the ACDEMO user in SQL*Plus and issue the following commands:
The result should show that all transactions, including all their steps, are complete with a delay of under 1 minute between them, but without any errors. In our example, the second transaction was submitted when the session was connected to the primary database (on ip-10-0-0-36), and the replay actually ran on the new primary database (on ip-10-0-0-168). You can see when the database failover occurred from the time stamps of the second transaction’s step 2 (14:56:07) and step 3 (14:56:58), and that the application resumed within a minute on the new primary DB instance.
Clean up
The services and components that you used for this solution are limited to the two RDS Custom for Oracle instances and the third EC2 instance for the observer. To clean up the resources you created, delete the RDS Custom instances and stop the EC2 instance with the observer.
Summary
In this post, you learned how to implement Application Continuity for your Oracle workloads in an RDS Custom for Oracle environment. You also learned how to test the implementation using a sample Java application. In the test, when you simulated a database failure, not only did the database fail over, but the application also failed over—automatically and transparently. Lastly, you verified that the application ran successfully without any interruptions or errors.
If you have questions or comments, please leave them in the comments section.
About the Author
Govardhanan Ramachandran is an Enterprise Support Lead Technical Account Manager at AWS. As part of AWS Energy, he helps his customers in the energy sector drive their adoption and use of AWS services. He provides them technical guidance to plan and build solutions using best practices, and to strive for maximum value from AWS. Govardhanan likes reading books and watching movies in his spare time.
Source: Read More