Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 9, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 9, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 9, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 9, 2025

      Your password manager is under attack, and this new threat makes it worse: How to defend yourself

      May 9, 2025

      EcoFlow’s new backyard solar energy system starts at $599 – no installation crews or permits needed

      May 9, 2025

      Why Sonos’ cheapest smart speaker is one of my favorites – even a year after its release

      May 9, 2025

      7 productivity gadgets I can’t live without (and why they make such a big difference)

      May 9, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Tap into Your PHP Potential with Free Projects at PHPGurukul

      May 9, 2025
      Recent

      Tap into Your PHP Potential with Free Projects at PHPGurukul

      May 9, 2025

      Preparing for AI? Here’s How PIM Gets Your Data in Shape

      May 9, 2025

      A Closer Look at the AI Assistant of Oracle Analytics

      May 9, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      kew v3.2.0 improves internet radio support and more

      May 9, 2025
      Recent

      kew v3.2.0 improves internet radio support and more

      May 9, 2025

      GNOME Replace Totem Video Player with Showtime

      May 9, 2025

      Placemark is a web-based tool for geospatial data

      May 9, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Databases»Improve PostgreSQL performance using the pgstattuple extension

    Improve PostgreSQL performance using the pgstattuple extension

    April 2, 2025

    As businesses continue to generate and store vast amounts of data, the need for efficient and reliable database management systems becomes increasingly important. PostgreSQL, an open source relational database management system (RDBMS), has established itself as a powerful solution for handling complex data requirements. One of the key strengths of PostgreSQL lies in it’s extensibility. Through a rich ecosystem of extensions and plugins, developers can enhance a database’s functionality to meet specific requirements. These extensions range from spatial data support and full-text search capabilities to advanced data types and performance optimization tools. Although PostgreSQL offers a wide range of features and capabilities, one extension that often goes overlooked is pgstattuple—a tool that can provide significant value for gaining insights into the internal workings of PostgreSQL databases.

    In this post, we explore pgstattuple in depth—what insights it offers, how to use it to diagnose issues in Amazon Aurora PostgreSQL-Compatible Edition and Amazon Relational Database Service (Amazon RDS) for PostgreSQL, and best practices for harnessing its capabilities.

    Overview of pgstattuple

    The pgstattuple extension provides a set of functions to query detailed statistics at the tuple level within PostgreSQL tables and indexes. This offers visibility into the physical storage layer that standard PostgreSQL statistics views don’t provide.

    Some of the table and index level metrics exposed by pgstattuple include:

    • tuple_count – Number of live tuples
    • dead_tuple_count – Number of dead tuples not yet cleaned up
    • tuple_len – Average length of live tuples in bytes
    • free_space – Total free space available in bytes
    • free_percent – Percentage of free space; higher values indicate more bloat
    • dead_tuple_len – Total length of dead tuples in bytes
    • dead_tuple_percent – Percentage of space occupied by dead tuples

    These metrics are more than just numbers – they’re early warning system for database health and performance issues. By monitoring these statistics, you can proactively identify storage issues that might be silently impacting your database performance. Whether it’s excessive table bloat consuming disk space, or index fragmentation slowing down queries, pgstattuple helps spot these issues before they become critical problems.

    Using pgstattuple in Aurora and Amazon RDS

    Both Aurora and Amazon RDS support using the pgstattuple extension. To enable it, you first need to create the extension in your database using the CREATE EXTENSION pgstattuple; command. After it’s enabled, you can use functions like pgstattuple(relation) to get details on the physical storage used by a table including the number of pages, live tuples, dead tuples, and more. The pgstattuple_approx(relation) function provides a faster estimate of these metrics. You can also get index statistics using pgstatindex(index). Analyzing this low-level data can help identify bloated tables that need vacuuming, find tables with high dead tuple ratios that could benefit from being rewritten, and optimize your database’s physical storage utilization.

    The output of pgstattuple provides actionable insights for monitoring, maintenance, and performance tuning, as discussed in the following sections.

    Detecting and managing table bloat

    Identifying and managing bloat is one of the most useful applications of pgstattuple for PostgreSQL tables. Bloat arises when UPDATE and DELETE operations leave behind unused space not automatically reclaimed. PostgreSQL maintains data consistency through a Multiversion Concurrency Control (MVCC) model, where each SQL statement sees a snapshot of data from some prior time, regardless of the underlying data’s current state. This prevents statements from viewing inconsistent data due to concurrent transactions updating the same rows, providing transaction isolation per database session. Unlike traditional locking methods, MVCC minimizes lock contention, allowing reasonable multiuser performance.

    When deleting a row in MVCC systems like PostgreSQL, the row isn’t immediately removed from data pages. Instead, it’s marked as deleted or expired for the current transaction but remains visible to transactions viewing an older snapshot, avoiding conflicts. As transactions complete, these dead or expired tuples are expected to eventually be vacuumed and the space is reclaimed. In PostgreSQL, an UPDATE operation is equivalent to a combination of DELETE and INSERT. When a row is updated, PostgreSQL marks the old version as expired (like a DELETE) but keeps it visible to older transaction snapshots. It then inserts a new version of the row with the updated values (like an INSERT). Over time, the expired row versions accumulate until the VACUUM process removes them, reclaiming space. This approach enables PostgreSQL’s MVCC model, providing snapshot isolation without explicit locking during updates.

    PostgreSQL’s autovacuum is an automated maintenance process that reclaims storage occupied by dead tuples and updates statistics used by the query planner. The autovacuum process triggers when maximum age (in transactions) crosses autovacuum_freeze_max_age, or when the threshold is hit: autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor * number of tuples. In this formula, autovacuum_vacuum_threshold represents the minimum number of updated or deleted tuples needed to initiate cleanup, while autovacuum_vacuum_scale_factor is the table size fraction added to the threshold calculation to determine when maintenance should occur. If autovacuum fails to clean up dead tuples for certain reasons, you may need to handle highly bloated tables manually.

    Dead tuples get stored alongside live tuples in data pages. Bloat could also be because of free space in the pages, for example after autovacuum cleaned up the dead tuples. During query execution, PostgreSQL scans more pages filled with dead tuples, causing increased I/O and slower queries. Highly bloated tables make database workloads consume unnecessary read I/O, impacting application performance. Cleaning up bloat may be required if autovacuum fails.

    Before we dive into analyzing table bloat with pgstattuple, let’s ensure you have everything set up to follow along. You’ll need access to an Amazon RDS or Aurora PostgreSQL instance, as well as a client machine with psql installed and properly configured to connect to your database. Make sure you have the necessary permissions to create tables and install extensions in your PostgreSQL environment. For this demonstration, we’ll use the pgbench_accounts table. If you don’t have this table already, you can easily create it using the pgbench utility. Run the command pgbench -i -s 10 to initialize a pgbench schema with a scale factor of 10, which will create the pgbench_accounts table along with other necessary tables. This will provide us with sample data to work with in our analysis. Additionally, you should have the pgstattuple extension installed on your database instance. If you haven’t installed it yet, you can do so by running CREATE EXTENSION pgstattuple; as a user with sufficient privileges. With these prerequisites in place, you’ll be ready to explore table bloat analysis using real data in a controlled environment.

    While pgstattuple provides comprehensive table bloat analysis, it can be resource-intensive. We recommend first using the lightweight bloat estimation queries documented here. If more detailed analysis is required, here’s how to use pgstattuple. The following example demonstrates how to use pgstattuple to analyze bloat information in a table.

    Create the table pgbench_accounts_test with 10,000 records:

    
    bench=> create table pgbench_accounts_test as (select * from pgbench_accounts limit 10000);
    SELECT 10000

    In this example, querying pgstattuple returns dead tuple count as zero, and table size 1672kB:

    SELECT * from pgstattuple('pgbench_accounts_test');
    -[ RECORD 1 ]------+--------
    table_len		   | 1343488
    tuple_count 	   | 10000
    tuple_len 		   | 1210000
    tuple_percent 	   | 90.06
    dead_tuple_count   | 0
    dead_tuple_len     | 0
    dead_tuple_percent | 0
    free_space 		   | 18896
    free_percent 	   | 1.41
    
    SELECT pg_size_pretty(pg_total_relation_size('public.pgbench_accounts_test'));
    pg_size_pretty
    
    1672 kB

    To demonstrate the usage of pgstattuple, we turn off autovacuum (not recommended in a production environment), and update 2,500 records:

    SHOW autovacuum;
    autovacuum
    off
    
    UPDATE pgbench_accounts_test SET bid=2 WHERE aid <2501;
    UPDATE 2500

    Now, pgstattuple data for this table shows 2,500 old versioned tuples are moved to the dead tuple.

    SELECT * from pgstattuple('pgbench_accounts_test');
    -[ RECORD 1 ]------+--------
    table_len 		   | 1679360
    tuple_count 	   | 10000
    tuple_len 		   | 1210000
    tuple_percent 	   | 72.05
    dead_tuple_count   | 2500
    dead_tuple_len 	   | 302500
    dead_tuple_percent | 18.01
    free_space 		   | 23620
    free_percent 	   | 1.41
    bench=> SELECT pg_size_pretty(pg_total_relation_size('public.pgbench_accounts_test'));
    pg_size_pretty
    
    1672 kB
    (1 row)

    bloat_percentage in PostgreSQL refers to the ratio of space that could be reclaimed in a table or index compared to its total size. It can be calculated using data from pgstattuple as follows:

    bloat_ percentage = (table_len - tuple_len/(table_len) * 100

    A bloat_percentage value exceeding 30%–40% often indicates problematic bloat requiring attention. To clean up bloat, use the VACUUM command:

    VACUUM pgbench_accounts_test;

    Let’s inspect the pgstattuple data after the VACUUM operation:

    SELECT * from pgstattuple('pgbench_accounts_test'); 
    -[ RECORD 1 ]------+--------
    table_len 		   | 1679360
    tuple_count 	   | 10000
    tuple_len 		   | 1210000
    tuple_percent 	   | 72.05
    dead_tuple_count   | 0
    dead_tuple_len     | 0
    dead_tuple_percent | 0
    free_space 		   | 353220
    free_percent 	   | 21.03

    The VACUUM operation resets dead_tuple_count to 0. The available space still attached with the table is available for insert or update operations in the same table. This makes table_len the same even after the VACUUM operation. For reclaiming disk storage occupied by bloat, there are two options:

    • VACUUM FULL – VACUUM FULL can reclaim more disk space but runs much more slowly. It requires an ACCESS EXCLUSIVE lock on the table it’s working on, and therefore can’t be done in parallel with other uses of the table. While VACUUM FULL operations are generally discouraged in production environments, they might be acceptable during scheduled maintenance windows where downtime is planned and approved.
    • pg_repack – pg_repack is a PostgreSQL extension that efficiently removes table and index bloat while maintaining online availability. Unlike CLUSTER and VACUUM FULL, it minimizes exclusive lock duration during processing, offering performance comparable to CLUSTER. While pg_repack allows for online table and index reorganization with minimal application downtime, it’s important to consider its limitations. The extension still requires brief exclusive locks during operation and may struggle to complete on tables with high-velocity transactions, potentially impacting database performance. For heavily-used tables where full repacking is challenging, consider the alternative of index-only repacking. Best practices include thorough testing in a non-production environment, scheduling during low-traffic periods, and having a monitoring and rollback plan in place. Despite its benefits, users should be aware of the potential risks and plan accordingly when implementing pg_repack in their PostgreSQL environments.

    The VACUUM FULL operation reduces the table_len:

    VACUUM FULL pgbench_accounts_test;
    VACUUM
    SELECT * from pgstattuple('pgbench_accounts_test');
    -[ RECORD 1 ]------+--------
    table_len 		   | 1343488
    tuple_count 	   | 10000
    tuple_len 		   | 1210000
    tuple_percent 	   | 90.06
    dead_tuple_count   | 0
    dead_tuple_len     | 0
    dead_tuple_percent | 0
    free_space 		   | 18896
    free_percent 	   | 1.41

    The VACUUM FULL operation reclaims the wasted space to disk storage and reduces table_len. The following query identifies bloat for the top 10 largest tables in your database using pgstattuple:

    SELECT relname,
    (table_len - tuple_len) AS bloat_size,
    ROUND(100 * (table_len - tuple_len)::numeric / NULLIF(table_len::numeric, 0), 2) AS bloat_percentage
    FROM (
    SELECT relname,
    (pgstattuple(oid)).table_len,
    (pgstattuple(oid)).tuple_len
    FROM pg_class
    WHERE relname IN (
    SELECT relname
    FROM pg_statio_user_tables
    WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
    ORDER BY pg_total_relation_size(relid) DESC,
    pg_relation_size(relid) DESC
    LIMIT 10
    )
    ) AS subquery
    ORDER BY bloat_percentage DESC;
    relname | bloat_size | bloat_percentage
    -----------------------+-------------+------------------
    pgbench_accounts_test  | 469360		 | 27.95
    pgbench_tellers 	   | 1307008 	 | 26.64
    pgbench_history 	   | 2081296 	 | 25.15
    pgbench_accounts 	   | 27945461248 | 18.76
    cust_accounts 		   | 267097	     | 9.94
    seller_accounts 	   | 2661689	 | 9.91
    prod_accounts 		   | 2661689	 | 9.91
    prod_teller 		   | 2661689	 | 9.91
    pgbench_accounts_1 	   | 6647546624  | 9.90
    stage_accounts 		   | 27659129 	 | 9.90
    (10 rows)

    pgstattuple does a full table scan and may cause higher consumption of instance resources such as CPU and I/O. This makes the pgstattuple operation slow for large tables. Alternatively, the pgstattuple_approx(relation) function provides a faster estimate of these metrics. While it’s less resource-intensive than pgstattuple, it could still be heavy on very large tables or busy systems. Consider running during off-peak hours or on a replica if available.

    Automating manual vacuum

    Regularly monitoring for bloat enables you to identify maintenance needs proactively, before performance suffers. The bloat metrics can also help fine-tune autovacuum settings to clean up space more aggressively if needed. After you identify the top 10 bloated tables, you can automate the VACUUM operation by using the pg_cron extension. pg_cron is a cron-based job scheduler for PostgreSQL that runs inside the database as an extension. It uses the same syntax as regular cron, but it allows you to schedule PostgreSQL commands directly from the database. The following code is an example of using the pg_cron function cron.schedule to set up a job to use VACUUM on a specific table every day at 23:00 (GMT):

    SELECT cron.schedule('manual vacuum', '0 23 * * *', 'VACUUM pgbench_accounts_test');
     schedule
    ----------
    1
    (1 row)

    After the preceding example runs, below query checks the history in the cron.job_run_details table as follows:

    SELECT * FROM cron.job_run_details;
    jobid  | runid | job_pid | database | username | command                        | status    | return_message | start_time                    | end_time
    -------+-------+---------+----------+----------+--------------------------------+-----------+----------------+-------------------------------+-------------------------------
     1     | 1     | 3395    | postgres | adminuser| vacuum pgbench_accounts_test | succeeded | VACUUM         | 2024-05-11 21:10:00.050386+00 | 2024-05-11 21:10:00.072028+00
    (1 row)

    Diagnosing and resolving index bloat

    Like tables, indexes in PostgreSQL can experience bloat that wastes space and impacts performance. pgstattuple enables detecting index bloat using pgstatindex.

    The following query shows the index identifier, total index size in bytes and average leaf density:

    SELECT i.indexrelid::regclass AS index,
           s.index_size,
           s.avg_leaf_density
    FROM pg_index AS i
    CROSS JOIN LATERAL pgstatindex(i.indexrelid) AS s
    WHERE indrelid = 'myschema.mytable'::regclass;

    Average leaf density is the percentage of useful data in the leaf pages of the index. Significantly bloated indexes can be rebuilt with REINDEX or pg_repack to eliminate dead space and restore optimal performance. It’s recommended to periodically check busy, high-churn indexes for bloat.

    Assessing index fragmentation

    Another valuable use of pgstattuple is identifying index fragmentation issues. Fragmentation occurs when index pages become scattered due to deletions, updates, and page splits. Heavily fragmented indexes have more dead tuples occupying space inefficiently.

    We can check the degree of fragmentation using leaf_fragmentation:

    SELECT * FROM pgstatindex('<index_name>');
    
    -[ RECORD 1 ]------+-------
    version            | 2
    tree_level         | 1
    index_size         | 196608
    root_block_no      | 3
    internal_pages     | 1
    leaf_pages         | 22
    empty_pages        | 0
    deleted_pages      | 0
    avg_leaf_density   | 64.48
    leaf_fragmentation | 13.64

    If leaf_fragmentation is high, the index is likely fragmented and a REINDEX should be considered. Rebuilding eliminates fragmentation and associated performance overhead.

    Best practices for using pgstattuple

    Consider the following best practices when using pgstattuple for PostgreSQL monitoring and maintenance:

    • To estimate bloat in PostgreSQL tables, use check_postgres query mentioned on PostgreSQL wiki.
    • Use the pgstattuple extension to analyze the physical storage of database tables, providing detailed statistics on space usage within the database, including how much space is wasted due to bloat.
    • Rebuild significantly bloated tables and indexes to reclaim dead space
    • Watch for high dead_tuple_percent to identify fragmentation issues
    • Focus maintenance on tables and indexes critical for workload performance
    • Avoid running pgstattuple on highly active tables to prevent interference
    • Use pgstattuple metrics to fine-tune auto-vacuum settings
    • Combine pgstattuple with query analysis and logs for holistic database insights

    Conclusion

    The pgstattuple extension serves as a powerful tool for uncovering crucial diagnostic metrics in PostgreSQL databases, exposing detailed storage statistics that help teams identify and address performance-impacting issues like bloat and index fragmentation. Working seamlessly with Aurora and RDS PostgreSQL, the extension provides essential visibility into storage patterns and maintenance requirements. Following pgstattuple best practices is key to maintaining efficient, high-performing PostgreSQL databases, and organizations can further enhance their database management through AWS’s support options – AWS Enterprise Support, Enterprise On-Ramp, and Business Support customers can leverage AWS Countdown Premium engagements for optimization guidance, enabling teams to confidently implement best practices and maintain optimal performance while focusing on their core business objectives.

    We welcome your comments and feedback in the comments section.

    If you have any questions or concerns, leave them in the comment section.


    About the Author

    Vivek Singh is a Principal Database Specialist Technical Account Manager with AWS focusing on Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL engines. He works with enterprise customers providing technical assistance on PostgreSQL operational performance and sharing database best practices. He has over 17 years of experience in open-source database solutions, and enjoys working with customers to help design, deploy, and optimize relational database workloads on AWS.

    Kiran Singh is a Senior Partner Solutions Architect and an Amazon RDS and Amazon Aurora specialist at AWS focusing on relational databases. She helps customers and partners build highly optimized, scalable, and secure solutions; modernize their architectures; and migrate their database workloads to AWS.

    Sagar Patel is a Principal Database Specialty Architect with the Professional Services team at Amazon Web Services. He works as a database migration specialist to provide technical guidance and help Amazon customers to migrate their on-premises databases to AWS.

    Source: Read More

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleMongoDB 8.0: Improving Performance, Avoiding Regressions
    Next Article Smashing Security podcast #411: The fall of Troy, and whisky barrel scammers

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 9, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-3528 – OpenShift Mirror Registry Privilege Escalation Vulnerability

    May 9, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Researchers from ETH Zurich, EPFL, and Microsoft Introduce QuaRot: A Machine Learning Method that Enables 4-bit Inference of LLMs by Removing the Outlier Features

    Development

    The combat in this upcoming day one Xbox Game Pass release is amazing — but I’ve still got some questions

    Development

    Rilasciata GCompris 25.0: Un’Innovativa Suite Educativa Festeggia 25 Anni

    Linux

    The upcoming Ray-Ban Meta smart glasses may feature the display upgrade we’ve been hoping for

    Development
    GetResponse

    Highlights

    Exploring Android threats and ways to mitigate them | Unlocked 403 cybersecurity podcast (ep. 5)

    August 29, 2024

    The world of Android threats is quite vast and intriguing. In this episode, Becks and…

    This $18 Roku HD streaming device is my impulse purchase for Black Friday

    November 27, 2024

    Sam Altman dismisses AGI deployment rumors as “Twitter hype,” while promising “very cool stuff” — “but please chill and cut your expectations 100x!”

    January 21, 2025

    Risograph Printing with WebGL

    June 28, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.