Imagine you’re managing a high-traffic application powered by an Amazon Relational Database Service (Amazon RDS) for PostgreSQL database. Everything is running smoothly until users report that they can’t make any transactions, and your monitoring tools are overwhelmed with alerts. As you examine the logs, you discover the issue: autovacuum has fallen behind, leading to performance problems and putting your database’s availability at risk.
For any Database Administrator, this is a challenging situation. Autovacuum isn’t just a background task, it’s the very foundation of your PostgreSQL database’s performance and availability. It automates essential maintenance work such as reclaiming storage by removing outdated row versions (dead tuples), updating visibility map of your tables for faster index-only scans, and gathering statistics to optimize query performance. Most importantly, autovacuum performs a special purpose operation called freezing to prevent transaction ID (XID) wraparound.
To explore autovacuum’s role and behavior in depth, refer to Understanding Autovacuum in Amazon RDS for PostgreSQL.
In this post, we introduce postgres_get_av_diag(), a new function available in RDS for PostgreSQL to monitor aggressive autovacuum blockers. By using this function, you can identify and address performance and availability risks through actionable insights provided by postgres_get_av_diag()
. An aggressive vacuum operation conducts a comprehensive scan of all pages within a table, including those typically skipped during regular vacuum operations, this thorough scan aims to “freeze†transaction IDs approaching their maximum age, effectively preventing transaction ID wraparound. We’ll also guide you on leveraging an AWS Lambda function to log data on aggressive autovacuum blockers to Amazon CloudWatch and send email alerts using Amazon Simple Email Service (Amazon SES) when blockers are detected. Additionally, we provide instructions for using Amazon EventBridge to schedule the Lambda function periodically. The postgres_get_av_diag()
function is currently supported in Amazon RDS for PostgreSQL version 17.1 and higher.
Before we dive into understanding and setting up the postgres_get_av_diag()
function and our proactive monitoring approach, let’s examine the key factors that can prevent autovacuum from freezing and cause it to fall behind.
Why can autovacuum fall behind?
Autovacuum can lag for various reasons:
- Long-running transactions or idle-in-transaction statements
- Uncommitted prepared transactions
- Inactive or delayed logical replication slots
- Long-running queries on read replicas with
hot_standby_feedback
enabled - Unvacuumed temporary tables
Additionally, autovacuum can be prevented from freezing when it encounters tables with logical inconsistencies in indexes or physical issues in database pages. Refer to Autovacuum Monitoring and Diagnostics for more information.
If autovacuum falls behind and the database approaches the critical 2-billion unfrozen transactions threshold, PostgreSQL will halt new write transactions as a protective measure to prevent data corruption. Although this safeguard helps prevent further XID wraparound risks, it also results in database downtime until resolved.
To monitor this critical threshold and take preemptive action, you can rely on the CloudWatch metric, MaximumUsedTransactionIDs
, which tracks the number of unfrozen transaction IDs. By setting up alerts on this metric, you can identify when your database is nearing the danger zone. For more information, refer to Implement an Early Warning System for Transaction ID Wraparound in Amazon RDS for PostgreSQL. While the CloudWatch metric provides early warnings, postgres_get_av_diag()
offers a deeper diagnostic tool to identify and resolve aggressive vacuum blockers.
In the following sections, we discuss how postgres_get_av_diag()
works, how to set it up, and how can you can use postgres_get_av_diag()
and proactively monitor aggressive vacuum blockers.
How postgres_get_av_diag() works?
The postgres_get_av_diag()
function works in conjunction with adaptive autovacuum and the autovacuum_freeze_max_age
setting. By default, postgres_get_av_diag()
identifies blockers only beyond the adaptive autovacuum threshold (typically 500 million unfrozen transactions). However, if your database’s autovacuum_freeze_max_age
is set higher, postgres_get_av_diag()
respects that value, preventing premature alerts. This makes sure the monitoring threshold adapts dynamically, using either the custom or system setting (whichever is higher), enabling effective autovacuum monitoring without interfering with database operations.
The maximum allowable setting for autovacuum_freeze_max_age
in Amazon RDS for PostgreSQL is 750 million transactions.
Set up postgres_get_av_diag()
To use the function, you are required to install the rds_tools
extension. Use the following command:
After the extension is installed, the function is created under the rds_tools
schema. Run the postgres_get_av_diag()
function and monitor autovacuum activity using the following query:
The query retrieves information about any detected autovacuum blockers, sorted in descending order by the autovacuum_lagging_by
column. This column represents the age of the blockers, with the oldest blocker listed first, indicating the priority for resolution.
When a blocker exceeds the threshold (500 million unfrozen transactions by default), the function generates an output similar to the following. In this example, a logical replication slot is preventing autovacuum from freezing transactions:
Refer to Resolving identifiable vacuum blockers in RDS for PostgreSQL for more examples.
The following sections show a solution for proactive monitoring.
Solution overview
A Python-based Lambda function, provided in this post, uses the postgres_get_av_diag()
function to identify and report aggressive autovacuum blockers. It dynamically obtains connectivity details and securely retrieves database credentials from AWS Secrets Manager before establishing a connection using pg8000. By running postgres_get_av_diag()
, the Lambda function identifies aggressive autovacuum blockers and logs the results to CloudWatch. The same blocker information is sent through Amazon SES to your verified email address. Using Amazon EventBridge, the Lambda function is periodically invoked, ensuring continuous monitoring and timely notifications for any detected blockers.
Prerequisites
Make sure the following prerequisites are in place prior to running the Lambda function.
Amazon EC2
Use an Amazon Elastic Compute Cloud (Amazon EC2) Linux instance to download necessary dependencies and package the Lambda function environment. This allows you to install Python packages like pg8000 and Boto3 directly onto your EC2 instance and bundle them with your Lambda deployment package. Optionally, you can use Amazon EC2 for setting up the necessary AWS Identity and Access Management (IAM) environment for the Lambda function.
AWS CLI
The AWS Command Line Interface (AWS CLI) is required for setting up resources, such as IAM roles, policies, and CloudWatch event rules. Make sure the AWS CLI is installed and configured with your AWS account credentials.
For more information, refer to Installing or updating the latest version of the AWS CLI and Configuring settings for the AWS CLI.
Runtime dependencies
The Lambda function requires the following libraries to interact with PostgreSQL and AWS services:
- pg8000 for PostgreSQL database access
- boto3 for interacting with AWS services, such as Secrets Manager and Amazon SES
For guidance on packaging dependencies, see Working with .zip file archives for Python Lambda functions.
Secrets Manager
AWS recommends using Secrets Manager for managing passwords, because it provides built-in encryption, automatic rotation, and fine-grained access control. As required by the Lambda function, store the database user’s password in Secrets Manager.
For more information, refer to Using AWS Secrets Manager secrets in AWS Lambda functions.
Database access
Make sure the Lambda function has network access to your database, which typically involves configuring it within the same virtual private cloud (VPC) and security groups as your RDS instance. When creating the Lambda function using the AWS CLI, you can use the --vpc-config
option.
For more information, refer to Automatically connecting a Lambda function and a DB instance and Tutorial: Using a Lambda function to access an Amazon RDS database.
Amazon SES configuration
Confirm that the email address or domain for sending alerts through Amazon SES is verified. Amazon SES requires verification for any email address or domain used to send email in production environments.
For more information, refer to Creating and verifying identities in Amazon SES.
Environment variables
Set up the following configuration variables in the Lambda function’s Environment variables section. These variables allow the function to dynamically access configuration details without hard coding them in the code.
- PostgreSQL connection details (
PGHOST
,PGPORT
,PGUSER
,PGDATABASE
) to establish a connection to the database - Amazon SES (
SENDER_EMAIL
,RECIPIENT_EMAIL
) to facilitate sending alert emails - Secrets Manager (
SECRET_NAME
) to retrieve the database password.
When creating the Lambda function with the AWS CLI, you can include the --environment
option as follows:
For more information, refer to Using Lambda environment variables to configure values in code.
IAM permissions for Lambda execution role
The Lambda execution role must have the following permissions to access necessary AWS services:
- secretsmanager:GetSecretValue – Enables Lambda to retrieve the database password stored in Secrets Manager. For more information, see Use AWS Secrets Manager secrets in AWS Lambda functions.
- AWSLambdaBasicExecutionRole (managed policy) – Allows Lambda to log query results and other diagnostics information to CloudWatch, supporting historical tracking and troubleshooting. For more information, see Access to CloudWatch Logs and AWSLambdaBasicExecutionRole.
- ses:SendEmail – Enables Lambda to send email alerts through Amazon SES, notifying stakeholders of any detected issues. For more information, see How do I use Lambda and Amazon SES to send email?
Deploy the Lambda function
Use the following code for the Lambda function. Open any text editor and save it in a new directory as lambda_function.py
.
Download the runtime dependencies using the following commands in the same directory where the function is saved, then zip the directory. The Lambda function uses python3.12; make sure Python 3 or above and pip
are installed locally before running these commands.
Create the Lambda function. The following is an example command to create the Lambda function. Replace the placeholder values with your specific details as needed:
Periodic monitoring
To provide continuous monitoring for aggressive autovacuum blockers, you can use EventBridge to schedule regular executions of Lambda function. Follow these steps to set up periodic monitoring (replace the placeholder values with your specific details as needed):
- Use
put-rule
to create a rule specifying the--schedule-expression
: - Use
put-targets
to link the EventBridge rule to your Lambda function by specifying its Amazon Resource Nam (ARN) as the target. This step makes sure the EventBridge rule invokes your Lambda function each time it runs. - Use
add-permission
to allow EventBridge to invoke your Lambda function. This step gives EventBridge the necessary permissions, with the rule ARN specified as the--source-arn
.
Refer to Tutorial: Create an EventBridge scheduled rule for AWS Lambda functions for more information.
Validate the setup
To validate that the setup is working, first check the Lambda function. On the Lambda console, find your function, and confirm recent successful invocations according to your EventBridge rule frequency.
To verify the CloudWatch logs, on the CloudWatch console, navigate to the log group /aws/lambda/YourLambdaFunctionName
and filter for AUTOVACUUM_MONITOR_LOG
to confirm logs are being generated. The following screenshot shows an example; av-blocker-logger-notify
is the name of the Lambda function used for this post (click to expand).
Each time a blocker is identified, the same blocker information is automatically sent to your email address verified by Amazon SES. You can confirm by checking your email account. The following screenshot shows an example email.
Cleanup
To avoid incurring unnecessary charges, make sure to delete all the resources created along this post. Terminate the EC2 instance used to create the Lambda function. Delete the EventBridge rule by using the AWS Management Console or AWS CLI. Then delete the Lambda function and its associated IAM role. Also, remove the CloudWatch Logs log group (/aws/lambda/YourLambdaFunctionName
) that was created for the Lambda function’s logging. If you no longer need the Amazon SES email configuration, you can remove the verified email addresses.
Finally, review your CloudWatch dashboard and delete any alarms or metrics that you created specifically for this monitoring solution.
Conclusion
The postgres_get_av_diag()
function offers an effective method for identifying aggressive autovacuum blockers in RDS for PostgreSQL. This post presented a solution using postgres_get_av_diag()
to enable proactive monitoring. With EventBridge, a Lambda function is configured to run periodic checks for autovacuum blockers. When blockers are found, Amazon SES sends an email alert, and integration with CloudWatch Logs provides extensive logging, supporting in-depth historical analysis of database performance.
With this monitoring solution, you can reduce the risk of transaction ID wraparound and prevent unexpected downtime. The postgres_get_av_diag()
function is currently supported in Amazon RDS for PostgreSQL version 17.1 and higher.
If you have any questions or feedback, share them in the comments section.
About the author
Naga Appani is a Database Engineer at Amazon Web Services and Subject Matter Expert for Amazon RDS for PostgreSQL, with over a decade of experience in working with relational databases. At AWS, he is mainly focused on PostgreSQL deployments. He works with developers on bringing new features and assisting customers in resolving critical issues with Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL database systems.
Source: Read More