Understanding transaction visibility in PostgreSQL clusters with read replicas

On April 29, 2025, Jepsen published a report about transaction visibility behavior in Amazon Relational Database Service (Amazon RDS) for PostgreSQL Multi-AZ clusters. We appreciate Jepsen’s thorough analysis and would like to provide additional context about this behavior, which exists both in Amazon RDS and community PostgreSQL. Coincidentally, our own internal testing recently found this same behavior, and we have already been working with the community to propose a fix for this long-standing issue (recognized and openly discussed by the PostgreSQL community since at least 2013).

The reported issue relates to the order that transactions become visible (that is, their results reflected in reads) differing between the primary and replicas in cluster configurations. This issue doesn’t lead to data loss or corruption, isn’t present in Single-AZ PostgreSQL deployments, and doesn’t affect Amazon Aurora PostgreSQL Limitless Database or Amazon Aurora DSQL databases.

In this post, we dive into the specifics of the issue to provide further clarity, discuss what classes of architectures it might affect, share workarounds, and highlight our ongoing commitment to improving community PostgreSQL in all areas, including correctness.

Understanding the Long Fork behavior

The report highlights what’s known as a Long Fork anomaly in database literature, which is considered a violation of Snapshot Isolation. Under this anomaly, it might be possible for two readers to observe the effects of transactions in a different order. For example, consider two concurrent transactions, T1 and T2, that modify distinct rows in a PostgreSQL database in a cluster configuration. The query Q1 issued against the primary might find that T1’s effects have already been recorded in the table whereas T2’s have not. Another query Q2 issued against a replica might observe the effects of T2 as already visible but not those of T1.

The reason for this behavior is that on a PostgreSQL primary (in both standalone and replicated configurations), the order in which the effects of non-conflicting transactions become visible might deviate from the order in which they become durable. In other words, the visibility order of transactions doesn’t always match their logged commit order. When PostgreSQL acquires a snapshot, it records the list of transactions that were pending at that time, which are tracked in the ProcArray. The effects of those pending transactions are permanently excluded from the snapshot even after these transactions commit. At commit time, a transaction makes itself durable by recording its status in the Write-Ahead Log (WAL) and then asynchronously removes itself from the ProcArray. Therefore, if T1 and T2 commit concurrently, T1 might become durable before T2 (in WAL) but T2 might remove itself from the ProcArray before T1 does.

This behavior has been known to the PostgreSQL community for many years, with detailed discussions occurring on the pgsql-hackers mailing list as far back as 2013. The Long Fork anomaly in community PostgreSQL affects all isolation levels (Read Committed, Repeatable Read, and Serializable). This behavior is not specific to Amazon RDS. It can be reproduced on self-managed PostgreSQL deployments.

Example of potential impact

To illustrate how Long Fork might manifest itself, consider a hypothetical dispute between Alice and Bob on whether the Jepsen post has ever reached the #1 spot on Hacker News. Assume that the number of page views for each post is recorded in separate rows of a relational table stored in a PostgreSQL database. The ranked list of posts is generated using a SQL query that sorts the posts by page view count. Alice and Bob access the website from different locations. Alice’s application server rendering the ranked list routes queries to the PostgreSQL primary, whereas Bob’s queries go to the replica.

Now suppose that Alice and Bob keep refreshing their browsers, watching Jepsen’s post rise in popularity. Alice sees the post reach #1—she takes a screenshot of her webpage for the record. Bob observes the post to only reach #2. He disputes Alice’s finding and asks the maintainers of Hacker News to send him the commit log of transactions that incremented page view counters. From the log, Bob determines that the tracked post almost reached the top rank but right before it did, its page view count was beaten by another post due to someone’s concurrent click. Technically, Bob is right even though he was reading from the replica. Alice has the screenshot to prove her point, yet she witnessed a database state on the primary that she wasn’t supposed to see according to the commit history. If the replica didn’t exist and if Alice had no access to the commit logs, her claim would be valid and the observed behavior compliant with the Snapshot Isolation semantics.

Aligning visibility order with commit order

Although this behavior represents a deviation from formal Snapshot Isolation guarantees, it rarely impacts application correctness in practice. Most applications naturally serialize their operations through application-level constraints or by operating on related data that creates direct conflicts. PostgreSQL committers considered various solutions that were discussed on mailing lists and presented at PGConf.EU 2024. One such solution makes the visibility order match the commit order using Commit Sequence Numbers (CSNs). The proposed fix is rather involved and spans multiple patches.

Even though the Long Fork anomaly is somewhat esoteric from the end-user perspective, fixing it is critical for bringing advanced enterprise-grade capabilities to PostgreSQL clusters. For example:

Support in distributed systems – Distributed PostgreSQL systems can’t use the visibility order because obtaining a consistent list of pending transactions across PostgreSQL nodes is practically infeasible. However, Aurora Limitless and Aurora DSQL implement consistent snapshots using time-based Multi-Version Concurrency Control (MVCC) instead, which avoids the Long Fork anomaly.
Query routing and read/write splitting – Offloading read-only queries and subqueries to synchronously updated or caught up read replicas might cause non-repeatable reads if visibility order deviates from commit order.
Data synchronization – Taking a copy of the database state using a snapshot query on the primary and rolling the state forward using the transaction log might cause inconsistencies.
Point-in-time restore – Restoring the database to a specific log sequence number (LSN) might produce a state that was never observable on the primary. This might complicate the analysis of application-caused data corruption because when a query returns incorrect results, it might be impossible to find a database state that the query ran on.
Storage layout optimization – Replacing the transaction identifiers in tuples by logical or clock-based commit time during query execution might make queries non-repeatable.
CPU utilization – Large production PostgreSQL servers support thousands of connections. In high-throughput, read-heavy workloads, a measurable fraction of CPU is spent on taking snapshots.

AWS’s commitment to PostgreSQL

At AWS, we’re deeply committed to PostgreSQL’s success. In 2022, we formed the PostgreSQL Contributors Team, dedicated to contributing to the core PostgreSQL engine. Our team actively participates in the PostgreSQL community’s development efforts. We employ leading database researchers advancing the state of the art in distributed databases. We maintain rigorous systems correctness practices, including formal methods for verification.

We will continue to work with the PostgreSQL community to address the long-standing Snapshot Isolation anomaly in PostgreSQL.

Conclusion

In this post, we discussed the issue of transaction visibility in PostgreSQL clusters with read replicas, including what classes of architectures it might affect.

While we work with the community on a long-term solution, consider taking the following actions:

Review your application’s assumptions about transaction ordering across nodes and endpoints. Applications should never rely on the implicit commit ordering of independent concurrent transactions.
Consider using explicit synchronization mechanisms if strict transaction ordering is required. Such mechanisms include shared counters (such as assigned ticket numbers or positions in the job queue), timestamps (such as time of the page view or stock trade execution time) or database constraints (for example, inventory must never go below zero).
Reach out to AWS Support with specific concerns about your deployments.

We remain committed to transparency and will continue to work with the PostgreSQL community to advance the state of the art in database technology.

10 Top Node.js Development Companies for Enterprise-Scale Projects (2025-2026 Ranked & Reviewed)

12 Must-Know Cost Factors When Hiring Node.js Developers for Your Enterprise

Mirantis reveals Lens Prism, an AI copilot for operating Kubernetes clusters

Avoid these common platform engineering mistakes

Just days after joining Game Pass, the Xbox PC edition of Call of Duty: WW2 is taken offline for “an issue”

Xbox layoffs and game cuts wreak havoc on talented developers and the company’s future portfolio — Weekend discussion 💬

Microsoft plans to revamp Recall in Windows 11 with these new features

This 4K OLED monitor has stereo speakers that follow you — but it’s missing something “imPORTant”

Flaget – new small 5kB CLI argument parser

Flaget – new small 5kB CLI argument parser

The dog days of JavaScript summer

Databricks Lakebase – Database Branching in Action

Just days after joining Game Pass, the Xbox PC edition of Call of Duty: WW2 is taken offline for “an issue”

Just days after joining Game Pass, the Xbox PC edition of Call of Duty: WW2 is taken offline for “an issue”

Xbox layoffs and game cuts wreak havoc on talented developers and the company’s future portfolio — Weekend discussion 💬

Microsoft plans to revamp Recall in Windows 11 with these new features

Understanding transaction visibility in PostgreSQL clusters with read replicas

Understanding the Long Fork behavior

Example of potential impact

Aligning visibility order with commit order

AWS’s commitment to PostgreSQL

Conclusion

Further reading

About the author

Introducing Gemma 3

Experiment with Gemini 2.0 Flash native image generation

Test Runner in Cucumber

A glimpse of the next generation of AlphaFold

CVE-2025-4367 – WordPress Download Manager Stored Cross-Site Scripting Vulnerability

CVE-2025-52833 – Designthemes LMS SQL Injection

Your Android phone is getting new security protections – and it’s a big deal for enterprises

CVE-2025-43851 – Adobe Retrieval-based-Voice-Conversion-WebUI Remote Code Execution Vulnerability

Computex

CVE-2025-4749 – D-Link DI-7003GV2 Denial of Service Vulnerability

Understanding transaction visibility in PostgreSQL clusters with read replicas

Understanding the Long Fork behavior

Example of potential impact

Aligning visibility order with commit order

AWS’s commitment to PostgreSQL

Conclusion

Further reading

About the author

Related Posts