Universally unique identifiers (UUIDs) are 128-bit values which are designed to be unique without the need of a central authority to manage them. This makes them ideal for the purpose of primary keys in a database, particularly for distributed systems. Historically, one of the most popular methods of generating UUIDs was UUID Version 4 (UUIDv4), which are non-sequential, randomly -generated UUIDs, due to the simplicity of adding them to applications.
The choice of a primary key can have an impact on application performance based on workload patterns. Although UUIDv4 is convenient when serving as a database primary key, its randomness could cause sub-optimal performance when data is being accessed or inserted into a database. Ideally, we would like commonly -accessed data to be clustered together in a database index such as B-tree and its variants. This increases the chance of them being present in the database memory when we query them consecutively, thereby reducing the likelihood of having to fetch data from disk, which is a more time-consuming operation. This issue, known as “temporal locality,†can be observed through a database’s “cache hit ratio“. If an index has a large, random distribution of data, it can negatively impact the cache hit ratio and cause a negative impact on overall performance.
One way to solve the issues due to the randomness of UUIDv4 is to use a primary key that groups commonly accessed data together. One method is to use a timestamp to help co-locate the data, as many applications will more frequently query data that’s created around the same time.
UUID Version 7 (UUIDv7) was introduced to improve the randomness of UUIDv4. UUIDv7 encodes a Unix timestamp with millisecond precision in the first 48 bits of the UUID, meaning that UUIDv7 is time-based and sequential. This eliminates the drawback of poor temporal locality when using UUIDv4. Insertions of data into a database index are clustered based on their sequence of creation. Similarly, retrieving data associated with a specific time also allows you to easily access data that were created close to that time.
Today, PostgreSQL provides built-in support for generating UUIDs that use the UUIDv1, UUIDv3, UUIDv4, and UUIDv5 implementations. However, as of this writing, PostgreSQL doesn’t natively support UUIDv7, but you can add UUIDv7 support today using Trusted Language Extensions for PostgreSQL (pg_tle). Additionally, should PostgreSQL support UUIDv7 in the future, this implementation will work with both older and newer versions of PostgreSQL.
Trusted Language Extensions (pg_tle) for PostgreSQL is a new open source development kit to help you build high performance extensions that run safely on PostgreSQL. It allows developers to create high performance database extensions using popular trusted languages, such as PL/Rust, JavaScript, Perl, and PL/pgSQL. In this post, we demonstrate how to create and install a Trusted Language Extension (TLE) using PL/Rust as the trusted language to generate a UUIDv7. We also take a deeper look into the underlying implementation of the extension.
Solution overview
Amazon Relational Database Service (Amazon RDS) for PostgreSQLÂ supports Trusted Language Extensions and PL/Rust extensions. PL/Rust enables developers to build safe and performant database functions in the Rust programming language. In this example, we install an UUIDv7 TLE that contains three PL/Rust functions to cover the following use cases:
Generate a UUIDv7
Generate a UUIDv7 from a user-provided timestamp
Extract the timestamp from a UUIDv7 as a PostgreSQL timestamp with time zone (timstamptz) type
These three database functions are packaged into a TLE. You can start using UUIDv7 after installing the TLE in PostgreSQL.
Amazon RDS for PostgreSQLÂ supports several other trusted programming languages, including PL/pgSQL, PL/Perl, PL/v8 (JavaScript), and PL/Tcl.
Prerequisites
To run the example in this post, you need to provision an RDS for PostgreSQL instance or Multi-AZ DB cluster running PostgreSQL 16.1 or higher, 15.2 or higher, 14.9 or higher, or 13.12 or higher. Additionally, you need to add pg_tle and plrust to the shared_preload_libraries parameter into a DB parameter group and assign the DB parameter group to your PostgreSQL database instance. You can also directly create a PostgreSQL database instance with the parameter group.
You can use the AWS Command Line Interface (AWS CLI) to create a DB parameter group and add pg_tle and plrust to the shared_preload_libraries parameter:
If you modify the shared_preload_libraries parameter on an existing database instance, the changes will not take effect until the instance is rebooted. You can also modify the parameter group directly from the AWS Management Console. If you create an instance directly with your custom parameter group, you can start using pg_tle and plrust when the instance is available. For more information, see Working with DB parameter groups.
Install the UUIDv7 trusted language extension
In the Trusted Language Extensions open source Github repository, we provide a way for you to install the extension. After you register the UUIDv7 extension into pg_tle, you can check if the extension is available in your desired database with the following command:
You will see the following output:
Next, install the plrust extension in your desired database with the following command:
You will see the following output:
Finally, install the UUIDv7 extension in your desired database with the following command:
You will see the following output:
Now you can start using the uuid_v7 extension.
Generate a UUIDv7
You can generate a UUIDv7 with the following command:
You will see output similar to the following:
You could a see a different UUIDv7 in your environment because UUIDv7s are random, unique, and time-based identifiers by nature.
Generate a UUIDv7 with a given timestamp
You might want to generate a UUIDv7 for data that is associated with a timestamp in the past or future. You can generate a UUIDv7 with a given timestamp using the following command:
You will see the following output:
Extract timestamp from a UUIDv7
You can extract the timestamp with a given UUIDv7 with the following command:
You will see the following output:
A deeper look into the extension
Let’s take a deeper look into the implementation of the extension. As we stated earlier, the extension provides three functions to generate a UUIDv7 and extract timestamp from a given UUIDv7. We use the Rust programming language in this solution because of its emphasis on memory safety and comparable performance with the C programming language. For more information about PL/Rust, see the PL/Rust Guide.
Exploring the generate_uuid_v7 function
All UUIDs are 128-bit long. A UUIDv7 is generated by encoding the Unix timestamp with millisecond precision in the first 48 bits. The remaining bits are used to store the version and variant of the UUID, and randomly -generated bits for uniqueness. The full implementation is available in our Github repository.
Let’s break down the generate_uuid_v7 function into different parts. First, you need to declare any dependencies that you want use in the PL/Rust function. This function uses the Rust rand crate as a dependency. This crate is used to generate random bits and encode them into the UUIDv7:
Let’s look at the code section of the function. First, you define a Rust type alias UuidBytes to represent a vector of u8 with a fixed size of 16:
Next, you need to know the current timestamp for generating the UUIDv7. To retrieve this, you can use the clock_timestamp API provided by the pgrx framework which returns the timestamp as a Datum type. This API is equivalent to the clock_timestamp provided by PostgreSQL.
You need to turn the Datum into a timestamp in millisecond precision. To do so, use the extract_part API provided by pgrx to determine the epoch (number of seconds since 1970-01-01 00:00:00 UTC), and multiply that by 1,000 to get the timestamp in millisecond precision as a u64. The extract_part API provided by pgrx is equivalent to the extract function provided by PostgreSQL.
To generate random bytes, define a helper method that returns a vector of 16 random u8 using the rand Rust crate. Not all 16 u8 are needed; in practice, we only need 10 of them.
Next, you break down the timestamp into a u32 and a u16 such that you can later encode it into the first 48 bits of a UUID. You also need to set the version and variant in the respective field in the UUID. To do this, our implementation uses the uuid Rust crate  as reference.[1]
You can now construct a UuidBytes (a vector of 16 u8) by combining all the components from earlier. Similarly, the implementation uses the Rust crate uuid as reference.[1]
Finally, you can construct a UUID using UuidBytes with the UUID::from_bytes API provided by pgrx.
Exploring the uuid_v7_to_timestamptz function
Next, let’s look at the uuid_v7_to_timestamptz function. You can find the full implementation of the function in the Github repository. This function accepts a UUID as an argument and returns the associated timestamp as a timetamptz PostgreSQL type.
Recall that a UUID is represented as a vector of 16 u8, which sums up to 128 bits. Because the timestamp is encoded in the first 48 bits of the UUIDv7, you will retrieve the first six u8. With the six u8 represented as a byte array in big endian format, you can then construct a u64 from it. This value represents the number of milliseconds between the timestamp since 1970-01-01 00:00:00 UTC.
At this point, you can convert this into a PostgreSQL timestamptz type using the to_timestamp PostgreSQL function. The pgrx framework provides an equivalent function called to_timestamp in Rust. Before that, you should divide the value by 1,000 as a Rust f64 type. You do this because the to_timestamp function in Rust accepts an f64 as argument, and the value needs to be converted from millisecond units to seconds.
Exploring the timestamptz_to_uuid_v7 function
Finally, let’s look at the timestamptz_to_uuid_v7 function. This function takes a timestamptz PostgreSQL type as argument and returns a UUIDv7 for the given timestamp. You can find the full implementation of the function in the Github repository.
The implementation of this function is similar to the generate_uuid_v7 function. The only difference between these two functions is that the timestamptz_to_uuid_v7 function accepts a given timestamp whereas generate_uuid_v7 uses the current timestamp to generate the UUIDv7. Therefore, you can directly generate a UUIDv7 with the given timestamptz instead of using the timestamp returned from the clock_timestamp API. The remaining implementation remains the same as generate_uuid_v7.
Comparing performance of generating UUIDv7 with PL/pgSQL
Let’s compare the performance between a solution using PL/pgSQL, our solution using PL/Rust, and the existing open source pg_uuidv7 extension.
In this test, you can use pgbench to measure the transactions per second to generate UUIDv7s for each option. Use the following command to continuously generate UUIDv7s with one concurrent database client for 30 seconds on an r6i.xlarge instance.
For PL/pgSQL, use the implementation from the following Github repository. For PL/Rust, use the generate_uuid_v7 function. For C, use the uuid_generate_v7 function from the open source pg_uuidv7 extension.
You can observe the following result from the PL/pgSQL implementation:
You can observe the following result from using the pg_uuidv7 extension which is written in C:
Finally, you can observe the following result from using the generate_uuid_v7 function implemented with PL/Rust:
For this experiment, the transactions per second (TPS) of using the function in PL/pgSQL is close to 30% less than the PL/Rust function. The TPS from using the function implemented with PL/Rust is comparable to that of using an UUIDv7 extension that was implemented in C.
Clean up
Uninstall the uuid_v7 extension with following command:
You will see the following output:
You can delete your RDS for PostgreSQL instance if you no longer need to use it.
Conclusion
In this post, we covered some challenges that might arise from the lack of support for UUIDv7. We also covered how Trusted Language Extensions for PostgreSQL (pg_tle) can help you start using UUIDv7. We explored the various functionalities that are provided in our solution, and dived deep into how the solution is implemented and the rationale behind it. Finally, we compared the performance of solutions implemented in different languages, and demonstrated that our solution implemented with pg_tle and PL/Rust is able to perform at a level that’s comparable to the pg_uuidv7 PostgreSQL C extension.
We invite you to leave feedback in the comments section.
About the author
Anthony Leung is a Software Development Engineer focusing on managed PostgreSQL at AWS. He has worked on developing multiple features that are currently offered in Amazon RDS for PostgreSQL.
[1] The Uuid Project is copyright 2013-2014, The Rust Project Developers and copyright 2018, The Uuid Developer. Uuid code is licensed under the Apache License, Version 2.0 (the “Licenseâ€); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS†BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Source: Read More