In this post we demonstrate how to enforce client-side timestamp-based write sequence order on Amazon DynamoDB. The goal is to ensure items with lower timestamps don’t overwrite items with higher timestamps, even if the requests are received out of order by the database.
Out of order requests
To understand why this might be useful, let’s first consider request hedging. Imagine you want a very low p99 read latency. One way to lower tail latencies is to hedge requests. You make a read request and then, if the response doesn’t come back quickly enough, make a second equivalent hedging request and let the two race. First response wins. If the first request suffered a dropped network packet, the second request will probably win. If things are just temporarily slow somewhere, the first request will probably win. Either way, hedging helps improve the p99 metrics, at the cost of some extra read requests.
You might want to do the same with write requests. You make the initial write request and, if it’s not acknowledged quickly enough, repeat the write request. When doing writes, however, there’s a concern if in the time between the first request and the hedged request some other thread performs its own update to the same item. The hedged request, because it comes in later, will overwrite the other thread’s work. Ideally you want the hedged write request to be treated as occurring at the time of the first write attempt.
The same concern can happen even without hedging, due to software development kit (SDK) retries. The SDK will, by default, if it sees a throttle or error response, internally wait a short while and retry the request. If another thread performs a write to the same item during that wait, the retry could overwrite the other thread’s work. The retry appears to the database like a later update, even though it’s a retry of a previous (unsuccessful) request. Again, it would be good if the retry could be marked as being effectively from the time of the first attempt.
The pattern comes up as well when processing data streams (such as from Amazon Kinesis). Each entry in the stream usually has a timestamp associated with it. These timestamps are not always in sequence, and even if they are you would have to work very hard and serialize all write requests to ensure the order of writes exactly matches the order of the timestamps. It would be good if the timestamps could be used in the write logic, ensuring that the lower timestamp values never overwrite higher timestamp values even with parallel processing.
Timestamp as an attribute
The fundamental requirement here is to use the client-side timestamp to decide if a write should succeed or not. To accomplish this, each item should store its effective timestamp in an attribute. The timestamp attribute doesn’t have to be a partition key or a sort key; a non-key attribute is fine. The name doesn’t matter.
The effective timestamp is the validity time for that data. If the data has some intrinsic timestamp (like coming from an event stream or a remote device), that’s the timestamp to use. When performing multiple writes (due to hedging or retries), the timestamp to use is that of the first write attempt. Retries reuse the same timestamp. That way they aren’t erroneously seen as newer.
To enforce write order, have each write include a ConditionExpression
that says the write can only succeed if the timestamp of the new item is not older than the timestamp of the existing item. If the timestamps are out of sequence, the write should fail.
It’s essentially a ratchet. Writes should only succeed if the client-side timestamp value is moving forward. Stale data should be rejected.
The following sample demonstrates the design with table holding the ratings given by a particular user for a particular movie. It includes the effective timestamp (in milliseconds) when that rating was given. Here we’re using Timestamp
as the attribute name.
PK | SK | Rating | Timestamp |
User#1 | Movie#A | 3 | 1721769060000 |
User#1 | Movie#B | 4 | 1721768150000 |
User#2 | Movie#A | 1 | 1721767220000 |
User#2 | Movie#Z | 5 | 1721757100000 |
The following sample Python code shows how to perform a PutItem
using the conditional expression. It allows the write only if the new timestamp (for the incoming item) is greater than or equal to the existing timestamp (for the item in the database):
table = dynamodb.Table('table')
item = {
'PK': 'User#1',
'SK': 'Movie#A',
'Rating': 5,
'Timestamp': 1721770090000
}
condition_expression = 'attribute_not_exists(Timestamp) OR :ts >= Timestamp'
expression_attribute_values = {
':ts': item['Timestamp']
}
response = table.put_item(
Item=item,
ConditionExpression=condition_expression,
ExpressionAttributeValues=expression_attribute_values
)
Deletions as a tombstone
Handling deletions requires the use of tombstones. A tombstone is a version of an item that marks a soft delete of that item. It’s necessary because if you truly delete the item by removing it from the database, then older data might effectively recreate it because there would be no record in the database of the delete having happened.
To implement a tombstone, you change any delete action and make it a soft delete by, for example, setting a Deleted
flag. The following code demonstrates this by deleting a review by User#2
‘s review of Movie#Z
. To save space (and remove the item from secondary indexes), it eliminates all other non-key attributes, but does keep the timestamp because it’s there to indicate the time of the soft delete. It also adds a TTL attribute to the tombstone, as discussed after the code example.
The following is a version of the table with a tombstone after User#2
removed their rating of Movie#Z
.
PK | SK | Rating | Timestamp | Deleted | TTL |
User#1 | Movie#A | 5 | 1721770090000 | ||
User#1 | Movie#B | 4 | 1721768150000 | ||
User#2 | Movie#A | 1 | 1721767220000 | ||
User#2 | Movie#Z | 1721757900000 | True | 1722362700 |
Inserting a new version of this item (where newer is defined by the Timestamp value) on top of this tombstone will succeed and replace the tombstone with the new live version.
When you query against this data model, it works the same as usual, except you must manually suppress any tombstones and avoid returning them to the user. When performing a Query
you can use a FilterExpression
for this, but when performing a GetItem
or BatchGetItem
there is no filter available and you must retrieve the item, review it in the database client code, and decide to pass it on or suppress it.
You can delete the tombstone item once enough time has passed that you’re sure you won’t receive any writes for that item preceding the tombstone time. The easiest way to manage that delete is to set a time to live (TTL) attribute on the tombstone item at insertion time for some days in the future. The above example does this for 7 days in the future. Make sure you enable TTL on your table.
Put and Delete operations only
Note that this technique works for applications using only PutItem
and DeleteItem
operations. These operations always write the whole item and accept a ConditionExpression
parameter as required by the design.
The BatchWriteItem
operation does not support conditions, so cannot be used. The UpdateItem
operation has the ability to modify just a portion of an item, so it’s risky to suppress old updates because perhaps they’re updating an unrelated section of the item. Only use this technique with UpdateItem
if the same portion of the item is always being updated and thus it’s safe to reject any out of order writes without impacting the final value.
These techniques also assume the source of the timestamp values has mitigated clock skew and that the timestamps associated with the data are sufficiently accurate.
Lastly, it’s not actually necessary for the timestamp to be a timestamp. It can be any monotonically increasing token value that indicates the proper sequence order.
Cost analysis
This design should incur minimal cost with typical usage. There’s a bit more data held on each item by adding the new attribute. There are some extra items held as tombstones in order to represent soft deletes. Reads and writes are equally performant, although a few extra tombstone items might appear in Query
calls. Tombstones can be deleted by TTL without cost.
Conclusion
In this post we showed how you can use a timestamp attribute and a condition expression to ensure older writes don’t succeed on top of newer writes. It’s straightforward, suitable for most use cases, and adds only a small amount of extra cost. Remember to use tombstones to mark deletes and suppress those tombstone items in your code before returning them.
Let us know your feedback in the comments section. You can find more DynamoDB posts and other posts written by Jason Hunter in the AWS Database Blog.
About the Author
Jason Hunter is a California-based Principal Solutions Architect specializing in Amazon DynamoDB. He’s been working with NoSQL databases since 2003. He’s known for his contributions to Java, open source, and XML.
Source: Read More