On April 29, 2025, Jepsen published a report about transaction visibility behavior in Amazon Relational Database Service (Amazon RDS) for PostgreSQL Multi-AZ clusters. We appreciate Jepsen’s thorough analysis and would like to provide additional context about this behavior, which exists both in Amazon RDS and community PostgreSQL. Coincidentally, our own internal testing recently found this same behavior, and we have already been working with the community to propose a fix for this long-standing issue (recognized and openly discussed by the PostgreSQL community since at least 2013).
The reported issue relates to the order that transactions become visible (that is, their results reflected in reads) differing between the primary and replicas in cluster configurations. This issue doesn’t lead to data loss or corruption, isn’t present in Single-AZ PostgreSQL deployments, and doesn’t affect Amazon Aurora PostgreSQL Limitless Database or Amazon Aurora DSQL databases.
In this post, we dive into the specifics of the issue to provide further clarity, discuss what classes of architectures it might affect, share workarounds, and highlight our ongoing commitment to improving community PostgreSQL in all areas, including correctness.
Understanding the Long Fork behavior
The report highlights what’s known as a Long Fork anomaly in database literature, which is considered a violation of Snapshot Isolation. Under this anomaly, it might be possible for two readers to observe the effects of transactions in a different order. For example, consider two concurrent transactions, T1 and T2, that modify distinct rows in a PostgreSQL database in a cluster configuration. The query Q1 issued against the primary might find that T1’s effects have already been recorded in the table whereas T2’s have not. Another query Q2 issued against a replica might observe the effects of T2 as already visible but not those of T1.
The reason for this behavior is that on a PostgreSQL primary (in both standalone and replicated configurations), the order in which the effects of non-conflicting transactions become visible might deviate from the order in which they become durable. In other words, the visibility order of transactions doesn’t always match their logged commit order. When PostgreSQL acquires a snapshot, it records the list of transactions that were pending at that time, which are tracked in the ProcArray. The effects of those pending transactions are permanently excluded from the snapshot even after these transactions commit. At commit time, a transaction makes itself durable by recording its status in the Write-Ahead Log (WAL) and then asynchronously removes itself from the ProcArray. Therefore, if T1 and T2 commit concurrently, T1 might become durable before T2 (in WAL) but T2 might remove itself from the ProcArray before T1 does.
This behavior has been known to the PostgreSQL community for many years, with detailed discussions occurring on the pgsql-hackers mailing list as far back as 2013. The Long Fork anomaly in community PostgreSQL affects all isolation levels (Read Committed, Repeatable Read, and Serializable). This behavior is not specific to Amazon RDS. It can be reproduced on self-managed PostgreSQL deployments.
Example of potential impact
To illustrate how Long Fork might manifest itself, consider a hypothetical dispute between Alice and Bob on whether the Jepsen post has ever reached the #1 spot on Hacker News. Assume that the number of page views for each post is recorded in separate rows of a relational table stored in a PostgreSQL database. The ranked list of posts is generated using a SQL query that sorts the posts by page view count. Alice and Bob access the website from different locations. Alice’s application server rendering the ranked list routes queries to the PostgreSQL primary, whereas Bob’s queries go to the replica.
Now suppose that Alice and Bob keep refreshing their browsers, watching Jepsen’s post rise in popularity. Alice sees the post reach #1—she takes a screenshot of her webpage for the record. Bob observes the post to only reach #2. He disputes Alice’s finding and asks the maintainers of Hacker News to send him the commit log of transactions that incremented page view counters. From the log, Bob determines that the tracked post almost reached the top rank but right before it did, its page view count was beaten by another post due to someone’s concurrent click. Technically, Bob is right even though he was reading from the replica. Alice has the screenshot to prove her point, yet she witnessed a database state on the primary that she wasn’t supposed to see according to the commit history. If the replica didn’t exist and if Alice had no access to the commit logs, her claim would be valid and the observed behavior compliant with the Snapshot Isolation semantics.
Aligning visibility order with commit order
Although this behavior represents a deviation from formal Snapshot Isolation guarantees, it rarely impacts application correctness in practice. Most applications naturally serialize their operations through application-level constraints or by operating on related data that creates direct conflicts. PostgreSQL committers considered various solutions that were discussed on mailing lists and presented at PGConf.EU 2024. One such solution makes the visibility order match the commit order using Commit Sequence Numbers (CSNs). The proposed fix is rather involved and spans multiple patches.
Even though the Long Fork anomaly is somewhat esoteric from the end-user perspective, fixing it is critical for bringing advanced enterprise-grade capabilities to PostgreSQL clusters. For example:
- Support in distributed systems – Distributed PostgreSQL systems can’t use the visibility order because obtaining a consistent list of pending transactions across PostgreSQL nodes is practically infeasible. However, Aurora Limitless and Aurora DSQL implement consistent snapshots using time-based Multi-Version Concurrency Control (MVCC) instead, which avoids the Long Fork anomaly.
- Query routing and read/write splitting – Offloading read-only queries and subqueries to synchronously updated or caught up read replicas might cause non-repeatable reads if visibility order deviates from commit order.
- Data synchronization – Taking a copy of the database state using a snapshot query on the primary and rolling the state forward using the transaction log might cause inconsistencies.
- Point-in-time restore – Restoring the database to a specific log sequence number (LSN) might produce a state that was never observable on the primary. This might complicate the analysis of application-caused data corruption because when a query returns incorrect results, it might be impossible to find a database state that the query ran on.
- Storage layout optimization – Replacing the transaction identifiers in tuples by logical or clock-based commit time during query execution might make queries non-repeatable.
- CPU utilization – Large production PostgreSQL servers support thousands of connections. In high-throughput, read-heavy workloads, a measurable fraction of CPU is spent on taking snapshots.
AWS’s commitment to PostgreSQL
At AWS, we’re deeply committed to PostgreSQL’s success. In 2022, we formed the PostgreSQL Contributors Team, dedicated to contributing to the core PostgreSQL engine. Our team actively participates in the PostgreSQL community’s development efforts. We employ leading database researchers advancing the state of the art in distributed databases. We maintain rigorous systems correctness practices, including formal methods for verification.
We will continue to work with the PostgreSQL community to address the long-standing Snapshot Isolation anomaly in PostgreSQL.
Conclusion
In this post, we discussed the issue of transaction visibility in PostgreSQL clusters with read replicas, including what classes of architectures it might affect.
While we work with the community on a long-term solution, consider taking the following actions:
- Review your application’s assumptions about transaction ordering across nodes and endpoints. Applications should never rely on the implicit commit ordering of independent concurrent transactions.
- Consider using explicit synchronization mechanisms if strict transaction ordering is required. Such mechanisms include shared counters (such as assigned ticket numbers or positions in the job queue), timestamps (such as time of the page view or stock trade execution time) or database constraints (for example, inventory must never go below zero).
- Reach out to AWS Support with specific concerns about your deployments.
We remain committed to transparency and will continue to work with the PostgreSQL community to advance the state of the art in database technology.
Further reading
To learn more about the work in the PostgreSQL community, how this problem is solved in Aurora DSQL and Aurora Limitless, and the research background, refer to the following resources:
- Aasma, A.: High-concurrency distributed snapshots (PGConf.EU 2024)
- Brooker, M.: AWS re:Invent 2024 – Deep dive into Amazon Aurora DSQL and its architecture
- Sher, A. J., Wein, D.: AWS re:Invent 2024 – Achieving scale with Amazon Aurora PostgreSQL Limitless Database
- Cerone, A., Bernardi, G. and Gotsman, A.,: A framework for transactional consistency models with atomic visibility. In 26th International Conference on Concurrency Theory (CONCUR 2015)
- Brooker, M. and Desai, A.: Systems Correctness Practices at AWS: Leveraging Formal and Semi-formal Methods. Queue, 22(6), 2024
About the author
Sergey Melnik is a Senior Principal Technologist at AWS working on distributed systems, data management, and cloud computing.
Source: Read More