This post is co-written with TaeHun Yoon and Changsoon Kim from Channel Corporation.
Channel Corporation is a B2B software as a service (SaaS) startup that operates the all-in-one artificial intelligence (AI) messenger Channel Talk. Channel Corporation’s vision is “to solve all problems between customers and companies,†and its first product, Channel Talk, emerged to solve communication issues between customers and companies. Channel Talk is an all-in-one product that provides live chat for customer service, chatbots, voice calls, customer relationship management (CRM) marketing, and in-house messaging. More than 150,000 customers are using Channel Talk in 22 countries, including Korea, Japan, and the US, and more than 70 million calls are made every month.
This two-part blog series starts by presenting the motivation and considerations for migrating from RDBMS to NoSQL. Part 2 covers how Channel Corporation uses streams to implement event-driven architecture.
In this post, we discuss the motivation behind Channel Corporation’s architecture modernization with Amazon DynamoDB, the reason behind choosing DynamoDB, and the four major considerations before migrating from Amazon Relational Database Service (Amazon RDS) for PostgreSQL.
Background: Business growth and emergence of new problems
Channel Talk was initially divided into two components: Team Chat and User Chat. The Team Chat feature, shown in the following screenshot, helps communication within a team.
The User Chat feature, shown in the following screenshot, helps communication between customers and companies.
As the business has grown 2–5 times every year since 2018, the number of requests per second (RPS) in the chat service also began to increase rapidly. Previously, we ran the chat service with Amazon RDS for PostgreSQL, so we naturally scaled up instance types when we needed higher performance.
However, we had to modify our scale-up strategy when we started a marketing service that sent campaign messages and one-time messages. A campaign message is a function that automatically sends messages based on rules when a customer performs a specific action. For example, sending a welcome message when a customer signs up or a discount coupon when viewing a pricing page. A one-time message is a function that sends messages to target customers just once at a desired time. For example, it can be used to issue limited discount coupons to customers who are currently online.
Channel Talk’s main workload was chat-based one-on-one customer consultation, so it wasn’t difficult to predict the peak traffic. However, the marketing service was expected to create lots of traffic spikes in multiple tables in an instant when a customer ran a sale event or sent one-time messages to a large number of customers. As a result, Channel Corporation needed a new scalable database, and there were four major considerations:
- Cost inefficiency – Instance-based services need to select the instance size based on the peak, which the marketing service was expected to add high variability in.
- Inter-table load propagation – High loads on specific tables and services could affect performance of the entire RDS instance.
- Support to implement efficient ways of performing operations that we did in PostgreSQL.
- Prepare a migration strategy aligned with our requirements and acceptable downtime.
Reasons to choose DynamoDB
In addition to solving the four problems we mentioned, Channel Corporation chose DynamoDB for three main reasons:
- The ability to implement event sourcing to other AWS services, a common event-driven architecture pattern
- Rich integration with other AWS services
- ACID transactions
Our approaches to solving the four problems
The first and second problems related with the marketing service could be solved simply by choosing DynamoDB.
We were able to handle the cost inefficiency problem flexibly because DynamoDB provided on-demand capacity mode. This mode instantly accommodated up to double the previous peak traffic, as long as we didn’t exceed double our previous peak within 30 minutes. Also, a table could be pre-warmed if necessary. In addition, provisioned capacity mode provided DynamoDB auto scaling, so it was possible to run steady workloads and set upper limits to prevent overuse. The following two figures show write traffic patterns when one-time bulk messages are generated in user_chat and message DynamoDB tables.
The second problem, inter-table load propagation, could be quickly solved because the load of one table didn’t affect other tables in DynamoDB. Therefore, we could improve our architecture to handle flexible traffic while operating the entire service with stability.
To address the third problem, we examined the key features of Channel Talk chat to see if existing PostgreSQL operations could be replaced with DynamoDB. The following screenshot illustrates this process.
The main features of Channel Talk chat are abstracted as follows:
- Chat room (Chat) – There are messages in a chat room
- Participation information (ChatSession) – Information such as the number of unread messages and the last read time in a specific chat room is recorded
The number of unread messages in the participation information is called a ChatBadge. Because a specific user can belong to multiple chat rooms at the same time, the total number of unread messages in multiple chat rooms is called a ManagerBadge.
These components have two requirements that operate separately:
- Atomicity – When a message occurs in each chat room, ChatBadge is atomically increased for receivers other than the sender
- Count – The sum of all ChatBadges that occurred in each chat room becomes ManagerBadge
Previously, because we used PostgreSQL, ChatBadge used atomic operations, and ManagerBadge was just a sum of ChatBadges. However, performing ad-hoc count of records after each user message would not scale and performance is likely to degrade over time as number of chat rooms grew for users. To solve this problem, we decided to use DynamoDB transactions. The following screenshot shows the attributes before the table design change.
The following screenshot shows the attributes after the table design change.
When a message is created in a particular chat room, the following operations are performed on each participant except the sender:
- Create an UpdateItem that increases the participant’s ChatBadge.
- Create an UpdateItem that increases the participant’s ManagerBadge.
- Handle the two UpdateItem operations using a TransactWriteItems API.
This way, we can make sure the sum of ChatBadges becomes ManagerBadge, and we can query each of them with a time complexity of O(1). Of course, this approach can cause transaction conflicts in situations where messages are continuously generated simultaneously, but because most of Channel Talk’s workload is one-on-one conversations for customer consultations, there aren’t many cases where simultaneous messages occur. In addition, when a large number of messages occur simultaneously, we have a procedure to handle them.
As shown in the following figure, when a conflict occurs, we use the exponential backoff retry strategy, and because DynamoDB transactions support idempotence when using the ClientRequestToken parameter, even if the same request is submitted multiple times due to problems such as connection timeouts, we can accurately manage ChatBadge and ManagerBadge within 10 minutes, which solved the third problem.
Lastly, the migration was done by stopping API servers during the early morning hours. Borrowing the segmenting idea of ​​DynamoDB parallel scan, we hashed the PostgreSQL ID and migrated them using the BatchWriteItem API in parallel for each segment using multiple threads.
Conclusion
DynamoDB provides nearly unlimited throughput and storage, schema flexibility, DynamoDB Streams to enable event-driven architectures, and ACID transactions for multiple tables. Since we adopted DynamoDB, Channel Talk has been operating its business by continuously adding new features without large-scale infrastructure changes in the chat business area. We gained the following main advantages using DynamoDB:
- We have been able to continuously add new features without improving the infrastructure or code structure. For example, even when a feature that would rapidly increase the number of messages is released, such as the ability for bots to generate messages based on rules, we have been able to operate the service without infrastructure issues.
- The cost model of DynamoDB, which allows us to pay only for what we use without separate instance costs, makes it straightforward to change the database cost, which was previously a fixed cost, into a variable cost. Because DynamoDB costs are also decreased when traffic decreases, there is no need to modify the architecture to reduce costs even when the business situation is not good, and the same is true when traffic increases.
- As the business area continues to expand, we are solving various problems by utilizing DynamoDB in areas that require horizontal scalability, such as measuring the usage of all Channel Talk customers in addition to chat.
In Part 2 of this series, we discuss how we integrated our solution with other services to solve areas what couldn’t be solved with DynamoDB alone.
About the Authors
TaeHun (Clsan) Yoon, a seasoned professional in the realm of computer engineering, spearheads innovative solutions as the CTO (Chief Technology Officer) at Channel Corp. With a keen focus on enhancing the chatting experience, TaeHun and his team are dedicated to resolving diverse challenges encountered by customers.
Changsoon (CS) Kim is a DevOps Engineer at Channel Corp. He is interested in efficiently resolving issues between development and operations.
Sungbae Park is Account Manager in the AWS Startup Team and a regular member of AWS SaaS TFC (Technical Field Communities). He is currently focusing on helping B2B software startups grow and succeed with AWS. Sungbae previously worked as Partner Development Manager making mutual growth with MSP, SI, and ISV partners.
Hyuk Lee is a Sr. DynamoDB Specialist Solutions Architect based in South Korea. He loves helping customers modernize their architecture using the capabilities of Amazon DynamoDB.
Source: Read More