Whether you’ve used DynamoDB for a day or a decade, this question has no practical relevance. As a serverless database, DynamoDB doesn’t have a version. DynamoDB has had no version upgrades, no maintenance windows, no patching, and no downtime due to maintenance since launching in January 2012. You access new DynamoDB features as they become available, with no database downtime or interruptions to your application. This concept may be new to developers who are accustomed to running database solutions that require routine version updates (minor and major), operating system (OS) patches, or instance maintenance and upgrades.
In this post, Druva, a leading provider of data security, describes their DynamoDB journey, including how they evolved their DynamoDB workloads over the last 12 years while incurring zero downtime. We will also describe recurring challenges you might face maintaining parity with the latest version of your instance-based database engines, and how the serverless architecture of DynamoDB provides increased resilience over instance-based database solutions.
Druva: Achieving zero-downtime with DynamoDB
Druva helps enterprise customers around the world secure and recover data from all threats. Druva intelligently unifies backup, disaster recovery, archival, and governance capabilities into a single, optimized data set. This centralized data protection increases the availability and visibility of business-critical information, while reducing the risk, cost, and complexity of managing and securing it.
Druva backs up customer sources such as virtual and physical machines, NAS files, and cloud applications in Amazon Simple Storage Service (Amazon S3). Our architecture decouples backup data and its associated metadata to independently optimize each storage class for performance and cost. We store system metadata as key-value pairs. In 2012, we chose DynamoDB as our key-value database because it provides single-digit millisecond latency at any scale, enabling us to quickly recover backed-up customer data when needed. As a managed service, DynamoDB eliminated management overhead for our CloudOps team, while allowing for capacity changes and configuration adjustments without any downtime. DynamoDB has scaled and worked seamlessly since we deployed it, even when workloads experienced high bursts of traffic periods.
What’s most interesting is a benefit that we didn’t anticipate when first evaluating DynamoDB. Because DynamoDB transparently applies database and OS updates, we haven’t incurred any of the downtime that we would have experienced with other database solutions. When DynamoDB releases new capabilities or deploys security improvements, we’re not required to take a maintenance window or upgrade path to use them. This seamless experience lets us test and use new features without the friction of taking our database offline to upgrade to the latest version. For example, we changed some of our tables from provisioned throughput to on-demand throughput, with no impact to our applications. We find that on-demand throughput is more cost efficient for our development and test workloads because they have variable traffic. With on-demand throughput, we don’t have to make any tradeoffs for performance, scalability, and reliability to achieve better cost efficiency. We also began using DynamoDB transactions to maintain data consistency across multiple tables, which is critical in backup and recovery scenarios. Lastly, implementing BatchGetItem and BatchWriteItem significantly improved performance by allowing multiple items to be read or written in a single operation.
We are happy to share that Druva has never experienced downtime since the implementation of DynamoDB as our high-speed metadata storage in 2012. Additionally, we haven’t received a single email from AWS alerting us about a mandated maintenance window for our DynamoDB tables. As a result of this increased resilience, Druva can provide stronger Service Level Agreements (SLAs) to our customers. SLAs are a key differentiator in our industry because customers rely on Druva in recovery scenarios such as a critical file being accidentally deleted, or mission-critical applications that require immediate restoration. We’ve also seen that our developers are more efficient when using DynamoDB over instance-based database solutions. Because DynamoDB handles the complexities of service maintenance and scaling, our development team spends more time innovating and improving core product features than managing infrastructure. This shift in focus accelerated product development cycles and enabled faster responses to market demands.
Maintaining version parity for your database engine
Let’s revisit the question that we initially posed in the blog post intro. For customers self-managing database workloads, understanding your current major version and your upgrade path to the latest version has perpetual relevance. New capabilities are introduced through major versions, and developers must plan for, allocate, and prioritize engineering resources to perform the required version updates.
Updating your database engine to apply major version updates can be time consuming, error prone, and often requires downtime. Major version updates also require testing to avoid performance regressions. To perform these tests, customers might copy their database to a staging environment, and testing the new database version in a production-like setting. When performance requirements are verified in the staging environment, production traffic is switched over. Traffic can also be rolled back to the previous version if issues arise. Though useful, this practice introduces cost and overhead. The impact of major version updates on developer hours, TCO, and availability is compounded when multiple database clusters are in scope, or when database clusters contain many instances. In some cases, customers choose to delay updating a database version to prioritize other engineering projects. Furthermore, customers who archive database backups must consider the version used in each backup. Database version discrepancies between archives and production workloads can negatively impact the recovery process.
DynamoDB and zero-downtime maintenances
For relational database workloads, these maintenance windows can result in an aggregate of 30-60 minutes or more of downtime per cluster annually. In comparison, the always-on architecture of DynamoDB increases application availability by transparently applying database and system updates with zero downtime. Customers, such as Druva, have shared that DynamoDB helps them improve their SLAs, reduce developer overhead, and respond to evolving business requirements by adopting new capabilities without the friction of version updates. Beyond version updates, DynamoDB also applies security patches, scales compute and storage capacity, and replaces failed hardware with zero downtime.
When DynamoDB launched in January 2012, we externalized the same technology that met Amazon’s needs for a highly reliable, ultra-scalable NoSQL database for use cases such as the shopping cart and session service. As we build new DynamoDB innovations to meet evolving customer requirements, we intentionally design new APIs to be backward compatible with existing APIs and existing resources. DynamoDB launches new capabilities with zero impact to existing workloads as a result of this design practice. For example, DynamoDB launched deletion protection in 2023 so you can protect your tables from accidental deletes. This capability launched with support for both new and existing tables. Regardless of how old your DynamoDB table is, you incur no downtime by adopting new DynamoDB capabilities.
Conclusion
In this post, we described how DynamoDB increases application uptime and resilience by transparently applying database updates with no impact to customer workloads. DynamoDB increased Druva’s developer efficiency by removing the need to worry about database lifecycle management. The application resilience gained from having no database maintenance windows strengthened Druva’s competitive position, improved customer satisfaction, and drove revenue growth through existing and new market opportunities.
To learn more about DynamoDB, refer to the Amazon DynamoDB Developer Guide. To learn more about Druva, see https://www.druva.com/.
About the authors
Somesh Jain, a Distinguished Engineer at Druva and IIT Roorkee alumnus, has left an indelible mark on data management technology. His innovative solutions, including proxy pool data back-up, file system usage correction, and dynamic file chunking, have earned multiple patents. As a key member of Druva’s Foundation Team, Somesh architected the company’s Storage Engine, built on a proprietary file system that ensures efficient back-ups, secure cloud storage, and robust data loss prevention. His contributions have revolutionized object storage-based indexing and file systems, setting new industry standards. Beyond his professional achievements, Somesh is an avid Rubik’s cube enthusiast, boasting an impressive personal best of 20 seconds for solving a 3×3 cube..
Jason Laschewer is an Outbound Product Manager on the Amazon DynamoDB team. Jason has held a number of Business Development roles in non-relational databases at AWS. Outside of work, Jason enjoys seeing live music, cooking, and spending time with his wife and three children in NY.
Smita Singh is a Senior Solutions Architect at AWS.
Source: Read More