This post is co-written with JeongHun Kim from Samsung Electronics.
Samsung Cloud is a cloud-based service that provides services such as backup/restore and synchronization, sharing, and device authentication of user data for all Samsung devices, including Galaxy smartphones around the world.
This blog post introduces five approaches Samsung Cloud has taken to continuously lower the total cost of ownership (TCO) for Amazon DynamoDB since migrating from Apache Cassandra to DynamoDB in 2015.
Samsung Cloud’s scale and high-level architecture
At AWS re:Invent 2017, Samsung Cloud shared how migrating 860TB of data from self-managed Cassandra to DynamoDB lowered our TCO by 40%. Since then, Samsung Cloud has scaled our DynamoDB workloads to 3.5PB of data and over 100 billion reads and writes per day. As a serverless, fully managed database with single-digit millisecond performance at any scale, DynamoDB enabled Samsung Cloud to scale these workloads with no downtime for adding storage and compute capacity. Because DynamoDB has no version upgrades, no maintenance windows, no patching, and no downtime maintenance, Samsung Cloud has also achieved increased resilience over using an instance-based solution. Similarly, we have adopted features to optimize cost as they are released with no downtime or maintenance windows, including TTL, Auto Scaling, and Standard-Infrequent Access. To put it simply, Samsung Cloud benefits from DynamoDB’s ongoing innovations without any impact to our workloads – we can apply new capabilities to our existing tables as they’re released.
The ability to adopt new DynamoDB capabilities with zero friction has helped Samsung Cloud evolve and optimize our DynamoDB use cases over the last nine years. The following image describes the requirements and scale of the core Samsung Cloud services backed by DynamoDB: Service, User, and Basic modules:
Service module – It is responsible for synchronizing data across multiple devices and sharing user data between multiple users. DynamoDB guarantees us to read the latest data from tables and local secondary indexes (LSI).
User module – DynamoDB is used for tasks such as authenticating devices and caching user data from other databases. Maintaining expired data is still costly because it needs to search and remove the targets in the writable instances. But DynamoDB helps us achieve it easily at no cost.
Basic module – Samsung Cloud uses DynamoDB as a metadata store to handle objects at scale which is a core function of Samsung Cloud. DynamoDB delivers consistent single-digit millisecond latency at any scale.
Samsung Cloud requires large volume and dynamic scale. Because it is hard to predict user traffic at scale, DynamoDB’s elasticity and scalability are essential for Samsung Cloud. Also, a table could be pre-warmed if necessary.
In the following sections, we share five approaches that Samsung Cloud employed to optimize DynamoDB costs.
Modeling
At the time of migration in 2015, Samsung Cloud had to transfer 860 TB from Cassandra, and we adopted the method of mapping one entity per DynamoDB table. Although this approach has been operating without major service issues until recently, there has always been a need to use it in accordance with the DynamoDB modeling guide. Consequently, the single-table design was actively introduced since 2022.
Since then, most new applications have been designed using a single-table design, which has led to two key benefits:
Optimization of throughput costs – When designing a service, usually multiple entities are defined. For example, if 10 entities are defined to build a new module, 10 tables must be created with the existing design. Then provisioned Capacity Units (CUs) need to be set to each table, monitoring its consumed CUs. However, if you design the 10 entities as a single table, different workloads are integrated and flattened, and because it’s one table, only a single provisioned CU is required. This in turn leads to reduced throughput costs.
Reduction in operational costs – Samsung Cloud believes that efficient data modeling has a significant impact on engineering hours. Just as in relational database management systems or Cassandra, not all tables have the same capacity and receive the same traffic. In DynamoDB, which is created on an entity basis, there are both high-use tables and low-use tables. In Samsung Cloud, high-use tables incur high costs, but service issues rarely occur because steady traffic is maintained. On the other hand, low-use tables incur little cost, but sudden spikes in Samsung Cloud’s DynamoDB auto scaling policy can cause throttling issues. Therefore, monitoring, alarms, and management policies for underused tables are required, resulting in additional operating costs. When designed with a single table, low-use entities only need to use provisioned CUs that are largely set up by entities with high usage, so not only are additional monitoring, alarms, and management policies implemented, but also most throttling issues disappear, which is a big advantage. Furthermore, because the number of tables to monitor and manage can be reduced, it can reduce operational costs.
DynamoDB auto scaling
Samsung Cloud has applied the DynamoDB auto scaling feature to all tables since its launch in 2017. When we first applied auto scaling, the actual usage of each table was checked to determine the minimum capacity unit value for auto scaling. This method was inefficient because it required a lot of time and manual effort to process many tables individually. An auto scaling policy that could be commonly applied to all tables was needed. When establishing this policy, we considered the adaptive capacity and burst feature which makes it possible to accommodate uneven data access patterns, unexpected requests in provisioned capacity mode, and set very aggressive values:
Minimum CUs (MIN): 20
Maximum CUs (MAX): The Region maximum
Target utilization: 88%
From a cost perspective, paying attention to the following minimum capacity unit and target utilization figures, if there is no table use, 20 CUs are maintained, and the number is increased only when the consumed CU is very close to the provisioned CU value. This number is a choice aimed at reducing costs even if throttling occurs, and in fact, throttling is included in Samsung Cloud services. However, to provide service stability, we applied defense logic, a fail-fast strategy, and a retry policy at the application level, taking into account the requirements of each application.
Additionally, provisioned capacity mode and auto scaling are used in production environments, and on-demand capacity mode is used in staging environments for verification or testing purposes because traffic isn’t constant.
Time to Live
In 2015, when Samsung Cloud first migrated to DynamoDB, there was no Time to Live (TTL) feature, so expired data was deleted by performing a separate batch job. When the DynamoDB TTL feature was first released, the batch job that was already in use was working well, so it was assessed that the benefit gained compared to the effort invested in converting the TTL was small. However, as the scale of the service continues to grow, the amount of data accumulated becomes greater than the amount deleted, and the storage size begins to grow exponentially. Additionally, running the separate batch jobs required the provisioning of Amazon Elastic Compute Cloud (Amazon EC2) instances and the consumption of RCUs and WCUs in DynamoDB, which incurred significant additional costs.
The cost of deleting TTL expired data from a database using a typical batch job increases exponentially as the service scales. To solve this problem, Samsung Cloud applied the TTL feature to DynamoDB tables where TTL expired data was being deleted through a batch job. The results were dramatic:
The storage size was reduced by approximately 94% from 1.2 PB to 74.5 TB
RCU usage was reduced by approximately 60% and WCU usage was reduced by approximately 70%
The overall costs for the tables with TTL enabled were reduced by approximately 90%
The following graph shows the number of items deleted by TTL from a specific table in Samsung Cloud, and shows that at peak times, more than 5 million items are being deleted for free per minute. For Samsung Cloud, TTL is an important feature that is one of the main reasons for using DynamoDB.
Reserved capacity
DynamoDB has the ability to reduce CU costs by up to 77% by purchasing reserved capacity. The red line in the following figure is the reserved capacity purchased by Samsung Cloud in 2022, and the green graph is the actual CU usage, meaning that the reserved capacity purchased a year ago was predicted accurately.
Samsung Cloud uses three steps to determine Reserved Capacity purchases:
Collection – We collect CU usage for the past three to six months by Region, exclude overused dates such as batch jobs or events, and consider increases or decreases in usage according to Samsung Cloud’s service plan for the next year.
Forecasting – Daily and monthly minimum and maximum CU values are derived, differences are analyzed, and the total CU usage for a year is predicted based on the minimum or maximum CU usage.
Calculation – The purchase volume is calculated as a percentage of 100%, 90%, or less of the predicted usage for each Region. After the purchase amount is determined, pre-purchase is made before the previous year’s reserved capacity expires.
After the purchase amount is applied, we monitor the actual usage coverage of the purchased reserved capacity, and if it’s insufficient, we implement FinOps by purchasing additional volume.
Samsung Cloud purchases reserved capacity on a yearly basis and achieves a discount rate of approximately 50% every year. Even if we purchase more reserved capacity than actual usage, we believe that the benefits are greater, so we purchase aggressively.
Standard-Infrequent Access
Lastly, the DynamoDB Standard Infrequent Access (Standard-IA) table class was announced at AWS re:Invent 2021. Samsung Cloud is carrying out a cost-optimization task for all resources every six months, and Standard-IA was considered as part of this task. As a result of checking the table cost structure for all tables, we found that there were many tables with storage costs exceeding the Standard-IA recommended change value of 50% and exceeding 80–90%. Because costs can change at any time depending on changes in RCU and WCU usage, Samsung Cloud established its own standard to select only tables that lead to effective cost reduction based on cost simulation results. About 50 tables were finally selected. Because Standard-IA only changes the cost structure with no impact on table performance, durability, or availability, and without changing your application code from the existing one, it can be applied on the AWS Management Console , AWS CloudFormation, or the AWS CLI/SDK at any time. If you check the cost using AWS Cost Explorer and see that it has increased, you can change back to Standard at any time within a month. Samsung Cloud once again achieved dramatic results of reducing costs by more than 30% after applying Standard-IA without modifying the application code.
Conclusion
Since introducing DynamoDB, Samsung Cloud has continuously sought and applied various methods and functions to optimize costs, and has introduced five methods (Modeling, DynamoDB auto scaling, TTL, Reserved capacity, and Standard-IA) that have resulted in significant cost savings. In addition, we are deriving ways to increase cost efficiency through various tasks, and we are continuously upgrading methods and procedures to add sophistication in the five methods presented. From the beginning of features such as auto scaling, TTL, and Standard-IA, which may be obvious to those who are new to DynamoDB, to the present, Samsung Cloud has been sharing the history of DynamoDB features that have helped with business growth. To learn more about optimizing costs on DynamoDB, visit the DynamoDB Well-Architected Lens in DynamoDB developer guide.
About the Authors
JeongHun Kim is a database engineer at Samsung Electronics and in charge of database for Samsung Cloud. In particular, he is interested in operational optimization/automation based on his extensive experience in distributed databases.
Hyuk Lee is a Sr. DynamoDB Specialist Solutions Architect based in South Korea. He loves helping customer modernize their architecture using the capabilities of Amazon DynamoDB.
HyoWon Um is a Senior Account Manager based in South Korea, is dedicated to overseeing Samsung Electronic’s Mobile eXperience(MX) division. His paramount objective is to foster trust and establish robust, reliable relationships by distinctly showcasing AWS’s value propositions to Samsung.
Hyeonseong Chang is a Sr. Technical Account Manager, helps customers resolve issues that arise while using AWS and provides technical support on architectural best practices and cost optimization methods to keep mission-critical systems running reliably and efficiently.
Source: Read More