Monitoring databases is essential in large IT environments to prevent potential issues from becoming major problems that can result in data loss or downtime. Having custom dashboards and alarm-based monitoring for the database can help in analyzing historical metrics patterns and improve database availability by alerting users of any abnormal threshold breaches. When choosing where to run Oracle databases on AWS, you have three main options: fully managed Amazon Relational Database Service (Amazon RDS) for Oracle, self-managed Oracle database on Amazon Elastic Compute Cloud (Amazon EC2) instances, and a hybrid approach with Amazon RDS Custom for Oracle, offering a balance of manageability and control. You can select the right option based your requirements.
Amazon RDS for Oracle uses Amazon CloudWatch for monitoring operating system (OS) metrics, Amazon RDS Performance Insights for monitoring database performance metrics, and Enhanced Monitoring for monitoring more in-depth OS metrics. EC2 instances and RDS Custom for Oracle databases use CloudWatch to monitor server-level metrics. Although these monitoring are comprehensive in monitoring the health of the database and server, you might need to monitor other database metrics like tablespace free space, long-running queries, and so on to avoid downtime. You can monitor these metrics either using Oracle Enterprise Manager or your own custom monitoring solution. Oracle Enterprise Manager is Oracle’s on-premises management platform that provides a single dashboard to manage your Oracle deployments, in your data center or in the cloud. The base installation of Oracle Enterprise Manager is free, but if you want to use the advanced features offered by Oracle Enterprise Manager, you need to purchase the appropriate management packs. Alternatively, you can use your own custom monitoring solution to monitor database metrics.StatsD is a popular open source solution that can gather metrics from a wide variety of applications that can be used to set up custom database monitoring. It is a simple NodeJS daemon that listens for messages on a UDP port and parses the messages, extracts metrics data, and periodically flushes the data to a CloudWatch agent, which is published to a CloudWatch dashboard.
In this post, we show you how to set up monitoring for your Oracle database using StatsD. This solution provides monitoring flexibility at lower cost.
Solution overview
In this use case, we use StatsD to monitor database-related metrics like tablespace free space, inactive sessions, database locks, and more in a CloudWatch dashboard for an Oracle database on AWS, which could be an RDS for Oracle, RDS Custom for Oracle, or Oracle database on Amazon EC2.
You can visualize StatsD metrics in CloudWatch as time series graphs, which can be used for alarming any threshold breaches. You can use it to continuously monitor and analyze database issues. For more information, see Retrieve custom data using StatsD.
The following are the high-level steps to set up monitoring with StatsD. You can install the CloudWatch agent on either the same server where the database is running or a separate EC2 instance.
- Set up CloudWatch with StatsD enabled on Amazon EC2.
- Create an application file to fetch the required monitoring metrics from the database.
- Validate if the StatsD metrics are sent to CloudWatch.
- Create a CloudWatch alarm for the required metrics.
- Test the alarm to verify the setup.
- Optionally, create a service to automatically start and stop the StatsD daemon.
The following diagram illustrates the solution architecture for custom monitoring for Amazon RDS for Oracle using StatsD.
You can use an EC2 instance to host the StatsD application code for creating the daemon. The code can be written in multiple language (Node, Scala, Go, Haskell, Ruby, and Python). We use Python for this use case. The database connect string and credentials are stored in AWS Secrets Manager.
The workflow consists of the following steps:
- Step 1 – When the daemon runs, it first retrieves the database credentials and connection details from Secrets Manager
- Step 2 and 3 – The StatsD daemon fetches the data at regular intervals using UDP protocol from the database
- Steps 4, 5, and 6 – StatsD sends the data to CloudWatch, which can be used for visualization or alarming
This solution offers the following benefits:
- It’s straightforward to install and comes with a CloudWatch agent
- StatsD clients are thin, carry no state, don’t need threads, and add negligible overhead
- Decoupling the application means StatsD can’t crash your application, and it doesn’t need to be written in the same language or even run on the same machine
- You can define the required metrics that need to be monitored in the application
- You can analyze the metrics on a CloudWatch dashboard for efficient database management
Prerequisites
Set up the following prerequisites for StatsD installation:
- Create a target Oracle database if you don’t have one already. It can be an Oracle database on Amazon EC2, Amazon RDS for Oracle, or RDS Custom for Oracle.
- Provision an EC2 instance where the CloudWatch agent with StatsD will be installed. This is specifically required for an RDS for Oracle database because it’s a fully managed service where you don’t have access to the underlying instance. For RDS Custom for Oracle and an Oracle database on Amazon EC2, you can install the agent on the database server.
- To set up the CloudWatch agent on Amazon EC2, attach the following permissions to the AWS Identity and Access Management (IAM) role created for CloudWatch installation. Refer to Create IAM roles to use with the CloudWatch agent on Amazon EC2 instances for more details.
- The AWS managed policy
AmazonSSMManagedInstanceCore
. - The AWS managed policy
CloudWatchAgentServerPolicy
. - The following inline policy with permission to run
GetSecretValue
:
- The AWS managed policy
- Set up the CloudWatch agent on RDS Custom for Oracle and provide additional privileges for tagged RDS Custom for Oracle resources. This includes
Cloudwatch:PutMetricData
andEc2:DescribeTags
.
Install the CloudWatch agent and set up custom database monitoring
In this section, we walk through the steps to install the CloudWatch agent and configure the custom database metrics monitoring using StatsD on an EC2 instance:
- Install the CloudWatch package by entering the following command on the EC2 instance:
For RDS Custom for Oracle, the CloudWatch agent is already installed, so this step is not required.
After you have downloaded the CloudWatch agent, you must create the configuration file before you start the agent on any servers. You can create it by using the wizard or creating it from scratch. If you’re creating it using the wizard, you must pass the required arguments. For this post, we demonstrate using the wizard.
- Create the configuration file:
- Pass the values required for the parameters. You can use the defaults except for a few parameters listed in the following code:
- Set up the CloudWatch agent using the configuration file you created:
- Start and enable the CloudWatch agent:
- Check the CloudWatch agent status:
- Verify the CloudWatch agent log file for any errors:
- Check for the StatsD process communication with CloudWatch:
If you’re using RDS Custom and get an error, review your permissions as instructed in the prerequisites section.
CloudWatch supports the following StatsD format:
- Use the following command to send the custom metrics to CloudWatch:
- On the CloudWatch console, navigate to the agent you created and review the Metrics.The previous command sends the metric called
TestConfig
with value (integer) as50
and metric type asCounter
.
- Run the following command to install the
python3
executable andoracledb
packages on the EC2 instance: - You can use Secrets Manager to store the database credentials. For more details, see Store Database Credentials in AWS Secrets Manager.
- Create a file called
config.py
. The file contains the database connection details that need to be monitored. Update the hostname, database name, and secret Amazon Resource Name (ARN) with the database connection details. The interval parameter defines how long the StatsD code will sleep after executing the SQL on the database (for this example, it’s set to 60 seconds). - Create another file called
dbmon.py
. This file contains the database attributes that need to be monitored. In the following example, we monitor tablespace free space and the count of the database session status. You can change this depending on your requirements and add multiple parameters to it. To change the monitoring attribute, replace the following SQL query with your own custom query. It uses the credentials configured inconfig.py
to connect to the database. Because the StatsD daemon is running on the same server the IP address,127.0.0.1
is used to point to the local machine.In the preceding code, we have added the code for monitoring the tablespace free space and the count of the database session status by assigning the respective SQL to the variables
sql1
andsql2
. You can add additional metrics with SQL in the preceding code. - Run the Python script:
- To verify the metrics on the CloudWatch console, navigate to the metric type.The following screenshot illustrates the tablespace free space metric.
The following screenshot illustrates the database session status count metric.
You can also view these metrics as time series graphs, and create alarms for notification.
- To create CloudWatch alarms on these metrics, refer to Create a CloudWatch alarm based on a static threshold.
Best practices
Consider the following best practices:
- Make sure you provide the IAM role only the access to the specific resources.
- Use Secrets Manager to store the password and never save it in the configuration file.
- Follow security best practices for deploying the solution on an EC2 instance. For more details, refer to Best practices for Amazon EC2.
Troubleshooting guide
You might experience the following connectivity issue while running dbmon.py
:
In this case, modify the security group on the DB instance to include inbound access for the monitoring EC2 instance.
Additionally, you might encounter an access issue while running dbmon.py
:
To address this issue, add the permission for GetSecretValue
to the IAM role.
Conclusion
In this post, we showed how you can use StatsD to monitor database-specific metrics for an Oracle database on AWS. You can visualize these metrics in a CloudWatch dashboard for analysis and remediation. For more details on the different StatsD parameters, refer to Retrieve custom metrics with StatsD.
About the Authors
Abhishek Kumar Verma is a Senior Database Consultant at Amazon Web Services ProServe. He has deep expertise in database migration and administration on Amazon RDS for Oracle, Amazon Aurora PostgreSQL and Amazon RDS SQL Server databases. He is also a subject matter expert in AWS DMS, Oracle GoldenGate, and Oracle Exadata. He works closely with customers to help migrate and modernize their databases and applications to AWS on a daily basis. When not working, Abhishek loves meditation and spending time with nature.
Source: Read More