How Mindbody improved query latency and optimized costs using Amazon Aurora PostgreSQL Optimized Reads

This post was co-written with Sandeep Koppula, Senior Manager, Software Engineering, Rahul Gupta, Senior Software Engineer at Mindbody & Mukesh Agrawal, Database Specialist, Sr. Solutions Architect at AWS.

Mindbody is the leading cloud-based technology platform for the fitness and wellness industry, empowering businesses to grow and thrive. Through innovative solutions such as client booking, scheduling, integrated payments, marketing, and analytics, Mindbody streamlines operations and enhances customer engagement. Thousands of businesses across the fitness and wellness space rely on Mindbody for all-in-one management; millions of consumers book experiences with those businesses through the Mindbody application. By using cutting-edge cloud technology, Mindbody provides scalability, reliability, and continuous innovation.

Amazon Aurora PostgreSQL Optimized Reads is a performance enhancement feature specifically designed for Amazon Aurora PostgreSQL Compatible Edition. It focuses on improving the efficiency of read operations by reducing the latency and increasing the throughput for queries. To learn more about this feature and use case, refer to Amazon Aurora Optimized Reads for Aurora PostgreSQL with up to 8x query latency improvement for I/O-intensive applications.

In this post, we highlight the scaling and performance challenges Mindbody was facing due to an increase in their data growth. We also present the root cause analysis and recommendations for adopting to Aurora Optimized Reads, outlining the steps taken to address these issues. Finally, we discuss the benefits Mindbody realized from implementing these changes, including enhanced query performance, significant cost savings, and improved price predictability.

“Upgrading to Aurora Optimized Reads brought meaningful advances to our operations. Despite minor adjustments needed during the transition, Amazon’s solutions and recommendations facilitated a smooth and low-friction migration. This shift has notably streamlined our database performance, cutting down query latency and allowing us to deliver quicker, more responsive service to our customers. The efficiency gains are coupled with cost savings, which allows us to reinvest in enhancing the customer experience and reinforcing our platform’s reliability.”

— Jacob Meacham, CTO, at Mindbody

Current architecture

Mindbody’s email marketing platform operates on an Aurora PostgreSQL cluster to support their business needs. The database version is 13.8, approximately 17 TB in size, and includes partitioned tables with billions of rows. To handle peak workloads, Mindbody uses the largest Aurora instances, and the workload is 80% read and 20% write.

The following diagram illustrates the current architecture.

Scaling and query performance challenges

Mindbody’s Marketing Suite application faced significant scaling and performance challenges due to architectural constraints and growing data demands. Despite support in Aurora for read scaling with up to 15 read replicas, their legacy version of application stack (rails) lacks read-write splitting, directing all workloads to the writer node while using readers solely as failover targets. Complex dynamic SQL queries, involving joins across billions of rows in partitioned tables, further complicate query optimization efforts.

The absence of an integrated data warehouse forces reliance on the online transaction processing (OLTP) database, adding strain to the Aurora writer instance. With vertical scaling maxed out at db.r6i.32xlarge, the application suffers from slower query execution, extended response times, and limited scalability.

Due to these challenges, Mindbody sought guidance on how to further scale up the cluster to improve query performance—without undergoing the extensive process of query tuning, application rewrites, or database refactoring involving sharding.

Root cause analysis

After reviewing performance data in Amazon CloudWatch and Amazon RDS Performance Insights, along with cost analysis from AWS Cost Explorer, we uncovered several key findings.

Firstly, CloudWatch shows that the average BufferCacheHitRatio consistently remains below 80%. A healthy ratio is typically above 95%, and anything lower suggests that queries are frequently accessing disk rather than being served from the cache.

This can be verified in performance insights by looking at the top wait event IO:DataFileRead. This event occurs when a connection waits on a backend process to read a required page from storage because the page isn’t available in shared memory. When the needed data isn’t in memory, Aurora fetches it from storage, increasing load on DB instance CPU and network utilization, causing higher latencies for queries, and incurring I/O costs. To mitigate the impact of network I/O latency involved in reading data pages from Aurora storage, Mindbody has configured larger instances, db.r6i.32xlarge, with memory matching their working datasets, to meet their business SLAs.

Mindbody expressed concerns about the cost implications of over-provisioned instances designed to accommodate memory requirements. The following chart highlights CPUutilization metrics, showing that the average CPU utilization on the primary writer instance remains below 20%, with occasional peaks.

In Cost Explorer, we found that the Mindbody Aurora cluster exhibits significant I/O intensity specific to read I/O. The monthly average I/O cost of this cluster accounts for approximately 48% of the total Aurora expenditure (we present a Cost Explorer graph illustrating this later in the post).

Reason for choosing Aurora Optimized Reads

Based on the root cause analysis above, Mindbody chose to adopt Aurora Optimized reads for three key reasons:

It provides an out-of-the-box tiered cache capability that extends DB instance caching capacity by using the local NVMe storage.
It provides a temporary objects capability out of the box. With this capability, temporary objects are hosted in NVMe storage. This enables better latency and throughput for queries that sort, join, or merge large volumes of data unable to fit within the configured memory for those operations.
Considering I/O-intensive workloads, the Aurora I/O-Optimized feature helps them achieve better price-performance.

Transitioning to Aurora PostgreSQL Optimized Reads

Enabling Aurora Optimized Reads requires upgrading the database cluster from version 13.8 to 14.9 or higher. Recognizing the technical complexity and the need for strict high availability, the Mindbody team conducted comprehensive testing in a proof-of-concept environment, gathering key performance metrics for business-critical queries, before implementing changes in the production cluster.

To minimize disruption in production, Mindbody used blue/green deployment and followed the recommended upgrade path—first updating the cluster to the latest minor version, then proceeding with the major version upgrade.

To learn more about the limitations and considerations for blue/green deployments, see Limitations and considerations for blue/green deployments.

Process followed to create a proof-of-concept environment

The goal of creating a production-sized proof-of-concept environment was twofold:

To perform comprehensive testing that demonstrated the benefits and to develop a detailed runbook for replicating the process in the production environment.
To evaluate the impact, the Mindbody team identified a representative set of top business-critical queries and ran them in the proof-of-concept environment, enabling side-by-side performance comparisons.

The following diagram illustrates the workflow for creating the proof-of-concept environment.

The top business-critical queries were run side by side on both the blue (non-optimized reads) and green (optimized reads) Aurora clusters to capture runtimes. The following table presents a comparison of runtimes.

At this stage, the Mindbody team had gathered the necessary metrics to confidently proceed with adopting Aurora Optimized Reads for the production migration and prepared a detailed runbook to replicate the steps in the production environment.

Benefits of switching to Aurora PostgreSQL Optimized Reads

Following the runbook prepared during the proof of concept, the Mindbody team first upgraded the minor version from 13.8 to 13.12 (Step 2 in the preceding diagram) followed by conducting a major version upgrade to 14.9 (Step 3) using Aurora blue/green deployment. After switchover to the green environment, Aurora cluster’s storage configuration was then modified to Aurora I/O-Optimized configuration (Step 4). This is an online operation and doesn’t require downtime.

At this stage, Aurora cluster instances were ready for modification to be replaced with Optimized Reads-compatible instances (this operation requires downtime). To minimize the disruption, the team first modified the reader instance to an Optimized Reads-compatible instance db.r6id.32xlarge (Step 5). At this stage, the Aurora primary writer instance remained on a non-Optimized Reads instance, and the reader instance had been modified to an Optimized Reads instance type.

On June 21, Mindbody conducted a manual failover, swapping the roles of the writer and reader instances. This action successfully transitioned the primary (writer) instance of the Aurora cluster to an Optimized Reads-compatible configuration. After recognizing the benefits, Mindbody modified the new reader (or old writer) instance to an Optimized Reads-compatible db.r6id.32xlarge instance.

In this section, we discuss the key performance improvements and cost savings the Mindbody team realized after transitioning to Aurora PostgreSQL Optimized Reads.

Key performance improvements

The following are the key metrics collected after the writer instance began running on the Optimized Reads-compatible Aurora instance:

Mindbody reported a significant improvement in runtime of their top modules. This improvement reduced the overall failures while fetching contacts for automations and campaigns.

In the following screenshot, we observe that the CloudWatch average CPUUtilization metric showed 6% CPU (with Optimized Reads) when compared to 12% CPU (without Optimized Reads). Average daily CPUUtilization reduced by 50%.

In the following screenshot, we observe that the CloudWatch ReadIOPS metric showed 90% reduction with Optimized Reads when compared to without Optimized Reads.

In the following screenshot, we observe that the Performance Insights IO:DataFileRead metric showed significant reduction with Optimized Reads (0.81) when compared to without Optimized Reads (7.79).

The AuroraOptimizedReadsCacheHitRatio CloudWatch metric for Aurora shows average 85% of read requests that are being served by the Aurora Optimized Reads cache.

Cost benefits

The detailed cost breakdown for the Aurora service is visible in Cost Explorer, providing insights into the tangible cost savings and price predictability realized by Mindbody in their Aurora usage.

Mindbody transitioned to Optimized Reads on June 21. The Cost Explorer graphs illustrate monthly Aurora costs by usage type over 6 months—3 months before and 3 months after the transition to Optimized Reads—showcasing an approximate 23% cost reduction starting in June. Notably, Aurora:StorageIOUsage is evident in the months of March to June. In contrast, from July onwards, there is an absence of Aurora:StorageIOUsage, resulting in a more predictable overall cost structure despite a significant increase in InstanceUsageIOOptimized:db.r6id.32xlarge.

With a 50% reduction in CPUUtilization, Mindbody now has the opportunity to downsize their Aurora instances from db.r6id.32xlarge to db.r6id.24xlarge. This adjustment could unlock additional cost savings, allowing for more efficient resource utilization while maintaining optimal performance.

Conclusion

Mindbody effectively used Aurora Optimized Reads to overcome performance challenges and improve cost-efficiency in their latency-sensitive application managing large datasets. This enabled Mindbody to meet their performance SLAs more efficiently while reducing costs. This post highlighted how Aurora PostgreSQL Optimized Reads can deliver substantial performance improvements and financial benefits for read-heavy applications requiring low latency.

You can start using the Optimized Reads feature today by visiting the Amazon RDS console and spinning up a supported Aurora instance. For more information, refer to Improving query performance for Aurora PostgreSQL with Aurora Optimized Reads.

About the Authors

Sandeep Koppula is a Senior Engineering Leader at Mindbody with 18+ years of IT experience, including 7 years in Technical Architecture and 8 years in Technical Management. He specializes in architecting, designing, developing, and deploying complex, high-volume, and scalable enterprise applications across diverse domains such as Fitness & Wellness services, Email Marketing, FinTech, PropTech, LMS/LXP, e-Commerce, Entertainment Industry Payroll, and Credit Bureau.

Rahul Gupta is a Senior Software Engineer at Mindbody with over 8 years of experience in full-stack development and cloud technologies. He has contributed to building scalable systems, improving application performance, and optimizing workflows. Proficient in ReactJS, Redux, Ruby on Rails, PostgreSQL, and AWS, Rahul has supported initiatives like enabling Mindbody to deliver over a billion emails and texts annually.

Mukesh Agrawal is a Database Specialist, Senior Solutions Architect at AWS, helping customers design scalable, optimized and innovative database solutions to maximize the value of AWS services.

Source: Read More

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Smashing Animations Part 4: Optimising SVGs

I test AI tools for a living. Here are 3 image generators I actually use and how

The world’s smallest 65W USB-C charger is my latest travel essential

This Spotlight alternative for Mac is my secret weapon for AI-powered search

Tech prophet Mary Meeker just dropped a massive report on AI trends – here’s your TL;DR

Beyond AEM: How Adobe Sensei Powers the Full Enterprise Experience

Beyond AEM: How Adobe Sensei Powers the Full Enterprise Experience

Simplify Negative Relation Queries with Laravel’s whereDoesntHaveRelation Methods

Cast Model Properties to a Uri Instance in 12.17

My Favorite Obsidian Plugins and Their Hidden Settings