Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      June 1, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 1, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 1, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 1, 2025

      My top 5 must-play PC games for the second half of 2025 — Will they live up to the hype?

      June 1, 2025

      A week of hell with my Windows 11 PC really makes me appreciate the simplicity of Google’s Chromebook laptops

      June 1, 2025

      Elden Ring Nightreign Night Aspect: How to beat Heolstor the Nightlord, the final boss

      June 1, 2025

      New Xbox games launching this week, from June 2 through June 8 — Zenless Zone Zero finally comes to Xbox

      June 1, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Student Record Android App using SQLite

      June 1, 2025
      Recent

      Student Record Android App using SQLite

      June 1, 2025

      When Array uses less memory than Uint8Array (in V8)

      June 1, 2025

      Laravel 12 Starter Kits: Definite Guide Which to Choose

      June 1, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      My top 5 must-play PC games for the second half of 2025 — Will they live up to the hype?

      June 1, 2025
      Recent

      My top 5 must-play PC games for the second half of 2025 — Will they live up to the hype?

      June 1, 2025

      A week of hell with my Windows 11 PC really makes me appreciate the simplicity of Google’s Chromebook laptops

      June 1, 2025

      Elden Ring Nightreign Night Aspect: How to beat Heolstor the Nightlord, the final boss

      June 1, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Databases»Supercharging vector search performance and relevance with pgvector 0.8.0 on Amazon Aurora PostgreSQL

    Supercharging vector search performance and relevance with pgvector 0.8.0 on Amazon Aurora PostgreSQL

    May 29, 2025

    Efficient vector similarity search has become a critical component for implementing semantic search, recommendation systems, and Retrieval Augmented Generation (RAG). Amazon Aurora PostgreSQL-Compatible Edition now supports pgvector 0.8.0, bringing significant improvements to vector search capabilities, making Aurora an even more compelling choice for AI-powered applications using PostgreSQL requiring semantic search and RAG.

    In this post, we explore how pgvector 0.8.0 on Aurora PostgreSQL-Compatible delivers up to 9x faster query processing and 100x more relevant search results, addressing key scaling challenges that enterprise AI applications face when implementing vector search at scale.

    pgvector 0.8.0 improvements

    While vector databases have emerged as important infrastructure components, effective vector search is the mission-critical capability that powers semantic applications. As organizations scale their AI applications to process millions or billions of vectors, the limitations of earlier vector search implementations become apparent. pgvector 0.8.0 introduces several critical improvements that directly tackle these production challenges, particularly when working with filtered queries against large datasets:

    • Performance improvements – pgvector 0.8.0 offers up to a 5.7x improvement in query performance for specific query patterns compared to version 0.7.4. These enhancements will be explored in more detail later in this post.
    • Complete result sets – The new iterative_scan feature in 0.8.0 provides improved recall for filter queries that require an approximate nearest neighbor (ANN) index search, a critical improvement over previous versions that could return incomplete results.
    • Enhanced query planning – Better cost estimation in 0.8.0 leads to more efficient execution paths, such as choosing a traditional index like a B-tree for a complex filtered search.
    • Flexible performance tuning – The introduction of iterative_scan in two modes—relaxed_order and strict_order – provides tunable accuracy compared to performance trade-offs.

    Challenges of overfiltering

    To appreciate the significance of this release, it’s important to understand a fundamental challenge with vector search that many developers encounter when moving to production. In previous versions of pgvector, when you combined vector similarity search with traditional SQL filters, the filtering happened after the vector index scan completed. This approach led to a problem called overfiltering, where your query might return fewer results than expected, or even none at all. It also created performance and scalability issues, as the system would retrieve many vectors only to discard most during filtering.

    Consider this scenario: You have an e-commerce service with millions of product embeddings. When searching for “summer dresses” with filters for “women’s clothing” and “size medium,” earlier versions of pgvector would follow these steps:

    1. Scan the vector index to find the nearest neighbors to “summer dresses.”
    2. Apply SQL filters like category = “women’s clothing” and size = “medium” to those neighbors.
    3. Return the remaining results, which could be too few or even empty, especially if the filters matched a small fraction of the data.

    HNSW (Hierarchical Navigable Small World) is an indexing algorithm in pgvector that accelerates vector similarity searches. It creates a multi-layered graph structure where vectors are connected to their nearest neighbors, allowing for efficient navigation through the vector space. With an HNSW index using default search settings (hnsw.ef_search = 40), if only 10% of your data matched the filter, you’d get roughly four usable results, regardless of how many relevant vectors were actually stored.

    Iterative index scans

    pgvector 0.8.0 introduces iterative index scans, which significantly improve query reliability and performance in filtered vector searches. The process works as follows:

    1. Scan the vector index.
    2. Apply any filters (e.g., metadata conditions).
    3. Check if enough results meet both the vector similarity and filter criteria.
    4. If not, continue scanning incrementally until either the required number of matches is found or a configurable limit is reached.

    This approach avoids prematurely stopping due to overly strict filters (a common problem in prior versions), reducing false negatives and improving performance by avoiding full rescans or returning too few results. It’s particularly valuable for production-grade vector search applications with complex filtering requirements.Let’s see this in action with a practical example that demonstrates the power of this new feature with Aurora PostgreSQL-Compatible.First, let’s create a table with sample product data:

    CREATE TABLE products (
        id bigint GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
        title TEXT,
        description TEXT,
        category TEXT,
        embedding VECTOR(384)
    );
    
    -- Create an index on the embedding column using HNSW
    CREATE INDEX products_embedding_idx ON products USING hnsw (embedding vector_cosine_ops);
    
    -- Also create an index on the category column for efficient filtering
    CREATE INDEX ON products (category);

    Now, let’s imagine we have populated this table with tens of millions of product embeddings from various categories. When a user searches for products similar to “comfortable hiking boots” but wants only items from the outdoor gear category, they’d run a query like the following:

    SELECT 
        product_name, 
        category, 
        embedding <=> '[vector for "comfortable hiking boots"]' AS distance
    FROM products 
    WHERE category = 'outdoor gear'
    ORDER BY distance
    LIMIT 20;

    Before pgvector 0.8.0

    With previous versions, if you had 10 million products but only 50,000 were outdoor gear in stock (0.5%), the default HNSW scan would likely return only a few results, missing many relevant products. The workarounds were suboptimal:

    • Increase hnsw.ef_search to scan more vectors (hurting performance)
    • Create separate indexes for each category (complex to maintain)
    • Implement application-level paging (adding complexity)

    With pgvector 0.8.0 on Aurora PostgreSQL

    Let’s enable iterative scanning and see the difference:

    -- Enable iterative scanning
    SET hnsw.iterative_scan = 'relaxed_order';
    
    -- Run the same query
    SELECT 
        product_name, 
        category, 
        embedding <=> '[vector for "comfortable hiking boots"]' AS distance
    FROM products 
    WHERE category = 'outdoor gear'
    ORDER BY distance
    LIMIT 20;

    Now, pgvector automatically continues scanning the index until it finds enough results to satisfy your query, making sure users see a complete and relevant set of results while maintaining performance. The threshold for “enough” is configurable: you can control how many tuples the system will scan before stopping. For HNSW indexes, this is governed by the hnsw.max_scan_tuples parameter, which defaults to 20,000. You can adjust this based on your dataset and performance goals:SET hnsw.max_scan_tuples = 20000;This gives you fine-grained control over the trade-off between recall (percentage of relevant results that are actually found) and performance during filtered vector search.Note: When using relaxed_order, you may need to reorder the results afterward to ensure proper ordering, for example:

    SELECT * FROM (
      -- original query
    ) p ORDER BY p.distance * 1;

    This forces a final reorder operation.

    Configuration options for iterative scanning

    pgvector 0.8.0 offers three modes for iterative scanning:

    • off – Traditional behavior, no iterative scanning (default)
    • strict_order – Iteratively scan while preserving exact distance ordering
    • relaxed_order – Iteratively scan with approximate ordering (better performance)

    For most production use cases, relaxed_order provides the best balance of performance and accuracy. This is because relaxed_order allows pgvector to prioritize speed by returning results as they’re discovered rather than sorting them perfectly. It significantly reduces query latency while typically maintaining 95-99% of result quality compared to strict ordering. In real-world applications where sub-second response times matter more than perfect ranking (like recommendation systems and semantic search), this trade-off delivers substantial performance gains with minimal practical impact on user experience. In addition to the hnsw.max_scan_tuples parameter, you can also configure the hnsw. scan_mem_multiplier parameter to improve recall. This parameter specifies the max amount of memory to use, as a multiple of work_mem (1 by default).

    Scaling RAG applications on Aurora PostgreSQL-Compatible

    Let’s consider how these improvements impact a real-world RAG application:Imagine an online marketplace with 10 million products, each represented by a 384-dimensional vector embedding generated from product descriptions. Customers can search across the entire catalog or filter by category, price range, or rating. With previous versions of pgvector, filtered searches might miss relevant products unless you carefully tuned parameters for each query pattern. With pgvector 0.8.0 on Aurora PostgreSQL-Compatible, the database automatically adjusts to produce complete results.To demonstrate the real-world impact of pgvector 0.8.0’s improvements, we conducted extensive benchmarking of both pgvector 0.7.4 and 0.8.0 on Aurora PostgreSQL-Compatible using realistic ecommerce workloads at production scale. Our tests focused on scenarios that businesses encounter when deploying large-scale product search systems.

    Benchmark setup

    We created a synthetic dataset of 10 million products with realistic ecommerce characteristics spanning multiple categories. To make these benchmarks reproducible, here’s how we generated the dataset:

    Data Generation Process

    1. Product Metadata Generation: Using a Python script with libraries like faker and numpy, we generated realistic product metadata:
    import pandas as pd
    import numpy as np
    from faker import Faker
    from sentence_transformers import SentenceTransformer
    import random
    # Initialize faker for generating realistic text
    fake = Faker()
    # Initialize the sentence transformer model
    model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
    # Define product categories with realistic distribution
    categories = {
        'electronics': 0.20,  # 20% of products
        'clothing': 0.25,
        'home_goods': 0.15,
        'beauty': 0.10,
        'books': 0.05,
        'toys': 0.05,
        'sports': 0.05,
        'grocery': 0.10,
        'office': 0.05
    }
    # Generate synthetic product data
    num_products = 10_000_000  # 10 million products
    products = []
    for i in range(num_products):
        # Select category based on distribution
        category = np.random.choice(
            list(categories.keys()), 
            p=list(categories.values())
        )
        
        # Generate title with category context
        title = fake.catch_phrase()
        
        # More detailed description
        description = fake.paragraph(nb_sentences=3)
        
        # Add some "smart" products for testing Query D
        if i % 50 == 0:  # 2% of products have "smart" in the title
            title = "Smart " + title
            
        products.append({
            'id': i+1,
            'title': title,
            'description': description,
            'category': category,
            # Embedding will be added in the next step
        })
        
        # Print progress
        if i % 100000 == 0:
            print(f"Generated {i} products")
    # Convert to DataFrame
    df = pd.DataFrame(products)
    1. Embedding Generation: We generated 384-dimensional embeddings using the all-MiniLM-L6-v2 SentenceTransformer model:
    # Generate embeddings in batches to manage memory
    batch_size = 1000
    for i in range(0, len(df), batch_size):
        end = min(i + batch_size, len(df))
        batch = df.iloc[i:end]
        
        # Generate embeddings from title + description
        texts = [f"{row.title}. {row.description}" for _, row in batch.iterrows()]
        embeddings = model.encode(texts)
        
        # Store embeddings
        for j, embedding in enumerate(embeddings):
            df.at[i+j, 'embedding'] = embedding.tolist()
        
        print(f"Generated embeddings for products {i} to {end}")
    1. Data Loading to PostgreSQL: We used PostgreSQL’s COPY command for efficient data loading:
    # Export data to CSV file (without embeddings column for faster export)
    csv_file = "product_metadata.csv"
    df[['id', 'title', 'description', 'category']].to_csv(csv_file, index=False)
    # Export embeddings separately as binary file for efficient loading
    embedding_file = "product_embeddings.binary"
    with open(embedding_file, 'wb') as f:
        for _, row in df.iterrows():
            embedding = np.array(row['embedding'], dtype=np.float32)
            f.write(embedding.tobytes())
    -- SQL to load data
    CREATE TABLE products (
        id bigint GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
        title TEXT,
        description TEXT,
        category TEXT,
        embedding vector(384)
    );
    -- Load metadata using COPY
    COPY products (id, title, description, category)
    FROM '/path/to/product_metadata.csv' 
    CSV HEADER;
    -- Load embeddings using custom function that reads binary data
    SELECT load_embeddings('/path/to/product_embeddings.binary', 'products', 'embedding', 384);

    The dataset we generated had the following characteristics:

    • 10 million products across 9 categories with a realistic distribution
    • 384-dimensional embeddings generated from product titles and descriptions
    • 2% of products containing “smart” in the title for filtered query testing
    • Natural language text generated using Faker to ensure variety and realistic content

    Additionally, a B-tree index on the category column was included to optimize filter operations commonly used in vector similarity searches. This dataset mirrors what organizations build for comprehensive product search systems.This setup can be reproduced using the code snippets above, adjusting the scale as needed for your testing environment.For these tests, we used a product catalog schema:

    CREATE TABLE products (
        id bigint GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
        title TEXT,
        description TEXT,
        category TEXT,
        embedding vector(384)
    );
    
    -- Create HNSW index for vector similarity search
    CREATE INDEX products_embedding_idx ON products 
        USING hnsw (embedding vector_cosine_ops);
    
    -- Create index on category for efficient filtering
    CREATE INDEX ON products (category);

    We ran the following sample queries:

    • Query A – Basic search (top 10):
    SELECT 
        id, 
        title, 
        description, 
        embedding <=> %s::vector AS distance
    FROM products
    ORDER BY distance
    LIMIT 10;
    • Query B – Large result set (top 1,000):
    SELECT 
        id, 
        title, 
        description, 
        embedding <=> %s::vector AS distance
    FROM products
    ORDER BY distance
    LIMIT 1000;
    • Query C – Category-filtered search:
    SELECT 
        id, 
        title, 
        description, 
        embedding <=> %s::vector AS distance
    FROM products
    WHERE category = 'category1'
    ORDER BY distance
    LIMIT 10;
    • Query D – Complex filtered search:
    SELECT 
        id, 
        title, 
        description, 
        category, 
        embedding <=> %s::vector AS distance
    FROM products
    WHERE category IN ('category1', 'category2', 'category3')
    AND title ILIKE '%%smart%%'  -- Filters for titles containing 'smart'
    ORDER BY distance
    LIMIT 100;
    • Query E – Very large result set (10,000):
    SELECT 
        id, 
        title, 
        description, 
        category, embedding <=> %s::vector AS distance
    FROM products
    ORDER BY distance
    LIMIT 10000;

    Testing methodology

    The benchmark was designed to replicate real-world vector search scenarios while providing consistent measurements:

    • Infrastructure – Two separate Aurora PostgreSQL clusters running on db.r8g.4xlarge instances (powered by AWS Graviton4 processors)
    • Dataset – 10 million products with 384-dimensional embeddings
    • Index configuration – HNSW indexes with identical parameters across tests for fair comparison
    • Cache management – Buffer cache cleared between tests to provide consistent cold-start performance
    • Query Runs – Queries A, B, and C were executed 100 times each, whereas the more intensive Queries D and E were run 20 and 5 times, respectively, with reported latency values representing the average across the runs to provide statistical significance and minimize the impact of outliers
    • Test configurations – We used the following configurations:
      • 0.7.4 baseline: ef_search=40
      • 0.7.4: ef_search=200
      • 0.8.0 baseline: ef_search=40, iterative_scan=off
      • 0.8.0: ef_search=40, iterative_scan=strict_order
      • 0.8.0: ef_search=40, iterative_scan=relaxed_order
      • 0.8.0: ef_search=200, iterative_scan=strict_order
      • 0.8.0: ef_search=200, iterative_scan=relaxed_order

    Performance improvements

    Our performance tests revealed significant improvements with pgvector 0.8.0 across the different query patterns. The following table shows p99 latency measurements (in milliseconds) for different configurations.

    Query Type 0.7.4 baseline (ef_search=40) 0.7.4 (ef_search=200)) 0.8.0 best config Best configuration Improvement
    A 123.3 ms 394.1 ms 13.1 ms ef_search=40, relaxed_order 9.4x faster
    B 104.2 ms 341.4 ms 83.5 ms ef_search=200, relaxed_order 1.25x faster
    C 128.5 ms 333.4 ms 85.7 ms ef_search=200, relaxed_order 1.5x faster
    D 127.4 ms 318.6 ms 70.7 ms ef_search=200, relaxed_order 1.8x faster
    E 913.4 ms 427.4 ms 160.3 ms ef_search=200, relaxed_order 5.7x faster

    The performance improvements with pgvector 0.8.0 were substantial across the different query patterns, even at this 10-million product scale. For typical ecommerce queries that search within specific categories for products matching certain criteria, runtime dropped from over 120 milliseconds with pgvector 0.7.4 to just 70 milliseconds with 0.8.0, while returning more comprehensive results.What’s particularly impressive is how pgvector 0.8.0’s improved cost estimation capabilities automatically chose more efficient execution plans. In our filtered query tests, the planner correctly estimated costs and provided more realistic assessments of vector operations complexity. As demonstrated in the “Enhanced cost estimation and query planning” section below, pgvector 0.8.0’s cost estimates (7,224.63 cost units) more accurately reflect the actual computational demands of vector operations compared to version 0.7.4 (116.84 cost units), leading to better execution plan selections and more complete result sets.

    Recall and result completeness enhancements

    Perhaps more important than raw performance is the substantial improvement in result quality when working with millions of vectors. Our tests demonstrated significant differences in result completeness. Remember that recall means we return X out of Y expected results, with 100% being perfect recall:

    Query 0.7.4 baseline (ef_search=40) 0.7.4 (ef_search=200) 0.8.0 with strict_order 0.8.0 with relaxed_order
    Category-filtered search 10% 0% 100% 100%
    Complex filtered search 1% 0% 100% 100%
    Very large result set 5% 5% 100% 100%

    For highly selective queries (products in a specific category), pgvector 0.7.4 returned only a fraction of requested results. With iterative scanning enabled in 0.8.0, we saw up to 100 times improvement in result completeness, substantially enhancing the user experience.The following is a query pattern we tested that demonstrates these improvements:

    -- Enable iterative scanning (pgvector 0.8.0 only)
    SET hnsw.iterative_scan = 'relaxed_order';
    
    -- e-commerce query with multiple filters
    SELECT id, title, category, embedding <=> '[query vector]'::vector AS distance
    FROM products
    WHERE category = 'category1'
    ORDER BY distance
    LIMIT 100;

    Different iterative scan modes and ef_search values

    We conducted a detailed comparison of different pgvector 0.8.0 configurations to understand the trade-offs between different iterative scan modes and ef_search values.

    Configuration Query A (top 10) Query B (top 1000) Query C (filtered) Query D (complex) Query E (large)
    0.8.0 baseline (ef_search=40, iterative_scan=off) 19.3 ms 18.8 ms 20.0 ms 15.7 ms 99.8 ms
    0.8.0 (ef_search=40, iterative_scan=strict_order) 18.1 ms 277.9 ms 197.1 ms 203.2 ms 344.0 ms
    0.8.0 (ef_search=40, iterative_scan=relaxed_order) 13.1 ms 164.1 ms 150.8 ms 99.1 ms 397.9 ms
    0.8.0 (ef_search=200, iterative_scan=strict_order) 28.8 ms 133.7 ms 128.5 ms 57.9 ms 207.6 ms
    0.8.0 (ef_search=200, iterative_scan=relaxed_order) 30.7 ms 83.5 ms 85.7 ms 70.7 ms 160.3 ms

    This detailed breakdown illustrates how different combinations affect performance across query types. For simple queries (A), a lower ef_search with relaxed_order provides the best performance. For complex filtered queries (C, D) and large result sets (B, E), higher ef_search values with relaxed_order typically offer the best balance of performance and completeness.The relaxed_order mode provides significantly better performance for most query types while still delivering complete result sets. For applications where exact distance ordering is less critical (like product recommendations), this mode offers an excellent balance of performance and results quality.

    Enhanced cost estimation and query planning

    Cost estimation in PostgreSQL refers to how the database predicts the computational resources (primarily CPU time and memory) required to execute a query. The query planner uses these cost estimates to determine the most efficient execution path.The query planning with pgvector 0.8.0 shows significant improvements in cost estimation accuracy and planning decisions. These enhancements enable PostgreSQL to make smarter choices about when to use vector indexes versus sequential scans, resulting in faster query execution, especially for complex queries combining vector similarity with traditional filters. To illustrate this, let’s examine the EXPLAIN output for a filtered query (Query C) from both versions.The following code is the pgvector 0.7.4 query plan (category filter):

    Limit  (cost=116.84..217.48 rows=10 width=90) (actual time=3.919..4.088 rows=6 loops=1)
      ->  Index Scan using products_embedding_idx on products  (cost=116.84..10127632.67 rows=987333 width=90) (actual time=3.919..4.087 rows=6 loops=1)
            Order By: (embedding <=> '[vector]'::vector)
            Filter: (category = 'category1'::text)
            Rows Removed by Filter: 494
    Planning Time: 0.170 ms
    Execution Time: 4.105 ms

    The following code is the pgvector 0.8.0 query plan with iterative_scan=relaxed_order:

    Limit  (cost=7224.63..7423.21 rows=10 width=90) (actual time=5.554..7.506 rows=10 loops=1)
      ->  Index Scan using products_embedding_idx on products  (cost=7224.63..20202546.50 rows=1017000 width=90) (actual time=5.554..7.504 rows=10 loops=1)
            Order By: (embedding <=> '[vector]'::vector)
            Filter: (category = 'category1'::text)
            Rows Removed by Filter: 570
    Planning Time: 0.177 ms
    Execution Time: 7.524 ms

    These query plans reveal several key improvements in 0.8.0:

    Note: PostgreSQL cost units are arbitrary internal measurements that represent estimated CPU and I/O workload. They don’t directly translate to milliseconds or other standard units, but higher values indicate the planner’s expectation of more resource-intensive operations.

    • More realistic startup costs – The 0.8.0 planner estimates a startup cost of 7,224.63 cost units versus only 116.84 cost units in 0.7.4, which much better reflects the actual computational complexity of vector operations
    • Better row estimation – The 0.8.0 planner estimates 1,017,000 filtered rows compared to 987,333 in 0.7.4, showing a more accurate assessment of the filter’s selectivity
    • Complete results – Most importantly, 0.8.0 returns the 10 requested rows, whereas 0.7.4 only found 6
    • Efficient use of indexes – With the addition of a category index, both versions can efficiently filter results, but 0.8.0 is more thorough in its index traversal due to iterative scan

    For complex filters (Query D), the differences are even more pronounced:The following code is the pgvector 0.7.4 query plan (complex filter):

    Limit  (cost=116.84..455.46 rows=100 width=100) (actual time=4.130..4.669 rows=39 loops=1)
      ->  Index Scan using products_embedding_idx on products  (cost=116.84..10170125.25 rows=2993034 width=100) (actual time=4.129..4.665 rows=39 loops=1)
            Order By: (embedding <=> '[vector]'::vector)
            Filter: ((title ~~* '%smart%'::text) AND (category = ANY ('{category1,category2,category3}'::text[])))
            Rows Removed by Filter: 461
    Planning Time: 1.378 ms
    Execution Time: 4.692 ms

    The following code is the pgvector 0.8.0 query plan with iterative_scan=relaxed_order:

    Limit  (cost=7224.63..7884.49 rows=100 width=100) (actual time=2.909..3.060 rows=100 loops=1)
      ->  Index Scan using products_embedding_idx on products  (cost=7224.63..20245171.57 rows=3067027 width=100) (actual time=2.909..3.053 rows=100 loops=1)
            Order By: (embedding <=> '[vector]'::vector)
            Filter: ((title ~~* '%smart%'::text) AND (category = ANY ('{category1,category2,category3}'::text[])))
            Rows Removed by Filter: 1
    Planning Time: 1.508 ms
    Execution Time: 3.083 ms

    The key difference here is that whereas 0.7.4 stops after finding only 39 rows (despite requesting 100), the 0.8.0 planner with iterative scan continues searching until it finds the 100 requested rows, with even better runtime.These examples demonstrate how the improved cost estimation in pgvector 0.8.0 leads to better execution strategies, particularly when combining vector searches with traditional database filters. The more accurate cost model helps the PostgreSQL optimizer make smarter decisions about execution paths, resulting in both better performance and complete result sets.

    Scaling to production workloads

    The Amazon Aurora I/O-Optimized cluster configuration offers enhanced price-performance and predictable pricing for I/O-intensive workloads, including e-commerce services, payment processing systems, recommendation systems, and RAG applications. This configuration enhances I/O performance with Aurora Optimized Reads via improved buffer cache management increasing write throughput and lowering latency. For dynamic or variable workloads, Amazon Aurora Serverless v2 provides a production-ready, auto-scaling option that adjusts capacity in fine-grained increments – ideal for quick starts and elastic scaling without sacrificing performance or availability.

    The ability of Aurora PostgreSQL-Compatible to scale read capacity through read replicas, combined with pgvector 0.8.0’s more efficient query processing, provides a robust foundation for enterprise-scale ecommerce applications. Businesses can now confidently build semantic search, recommendation systems, and RAG applications that maintain high performance and result quality even as their product catalogs grow into millions or even billions of vectors.

    Semantic search systems

    A semantic search use case might include product search, document retrieval, and content recommendation. 0.8.0 excels in the following ways:

    • The noticeable speed improvements (up to 9.4 times faster for basic queries) allow for real-time search experiences
    • relaxed_order mode is ideal for search interfaces where slight variations in result ordering aren’t perceptible to users
    • Improved filtered queries (Queries C and D) enhance faceted or category-filtered search implementations
    • Complete result sets make sure users see the most relevant items, unlike 0.7.4, which often missed key results

    An example implementation might be ecommerce product search where users expect sub-second results with filtering by product attributes.

    Large-scale recommendation systems

    A recommendation use case might include content recommendation, “similar items” features, and personalization. 0.8.0 offers the following benefits:

    • Much faster retrieval of larger result sets (Queries B and E) allows systems to fetch more candidates for postprocessing
    • Lower latency enables real-time recommendations on high-traffic systems
    • The performance on filtered queries supports contextual recommendations (for example, “similar products in this category”)
    • Better recall delivers diversity in recommendations

    An example implementation might be media streaming services that need to recommend thousands of content items from a catalog of millions in real time.

    RAG applications

    A RAG use case might include AI systems that retrieve relevant context before generating responses. 0.8.0 offers the following improvements:

    • Lower latency improves end-to-end response time for AI systems
    • Better performance on filtered queries enables domain-specific retrieval
    • Complete result sets make sure the AI has access to the relevant context
    • Relaxed ordering is ideal because RAG typically uses top-k retrieval where exact ordering isn’t critical

    An example implementation might be enterprise AI assistants that need to query company knowledge bases to answer user questions.

    Get started with pgvector 0.8.0 on Aurora PostgreSQL-Compatible

    To start using pgvector 0.8.0, complete the following steps:

    1. Launch a new Aurora PostgreSQL cluster running versions 17.4, 16.8, 15.12, 14.17, or 13.20 and higher.
    2. Connect to your DB cluster.
    3. After connecting to your database, enable the extension:

    CREATE EXTENSION IF NOT EXISTS vector;

    1. Confirm you’re running the latest version for pgvector:
    postgres=> SELECT extversion FROM pg_extension WHERE extname = 'vector';
     extversion 
    ------------
     0.8.0
    (1 row)

    Best Practices for pgvector 0.8.0 on Aurora PostgreSQL-Compatible

    When deploying pgvector 0.8.0 in production, consider the following best practices to balance performance, recall, and filtering accuracy:

    1. If you don’t need a vector index, don’t use it – For 100% recall and good performance with smaller datasets, a sequential scan might be more appropriate than a vector index. Only use vector indexes when you need the performance benefits for large datasets.

    For example, if you have a table with only 10,000 product embeddings, a sequential scan might actually be faster than using a vector index:

          -- For small tables, a simple sequential scan often performs better
          -- than maintaining indexes, especially if recall is critical
          -- This query does a sequential scan with cosine distance
          SELECT 
             id, 
             title, 
             embedding <=> '[query vector]'::vector AS distance
          FROM small_products_table
          ORDER BY distance
          LIMIT 20;
          -- Compare EXPLAIN ANALYZE output of this vs. the same query with an index
          -- You'll often find that for small tables (10K-50K vectors), 
          -- sequential scans provide better recall with minimal performance impact

    Creating vector indexes adds overhead for maintenance and storage, which only pays off when your dataset grows large enough that sequential scans become prohibitively expensive.

    1. Indexing recommendations
      1. Use HNSW with recommended parameters to ensure high search quality and efficient index construction:
    CREATE INDEX my_table_embedding_hnsw_idx
      ON my_table USING hnsw (embedding vector_cosine_ops)
      WITH (
          ef_construction = 128,
          m = 16
      );
      1. Create additional indexes on commonly filtered metadata columns (e.g., category, status, org_id) to improve performance of post-vector-filtering:

    CREATE INDEX my_table_category_idx ON my_table(category);

    1. Query-time tuning (search parameters)

    Depending on your use case, adjust these parameters to optimize for recall or performance:

      1. For maximum recall with filtering (such as, strict compliance or analytical use cases):
    SET hnsw.iterative_scan = 'strict_order';  -- ensures completeness
    SET hnsw.ef_search = 200;                  -- deeper graph traversal
      1. For best performance (e.g., interactive or latency-sensitive workloads):
    SET hnsw.iterative_scan = 'relaxed_order'; -- returns results faster
    SET hnsw.ef_search = 40;                   -- faster traversal
      1. For balanced scenarios (e.g., general-purpose retrieval):
    SET hnsw.iterative_scan = 'relaxed_order'; -- enables iterative scan fallback
    SET hnsw.ef_search = 200;                  -- configurable based on query complexity

    These recommendations are domain-agnostic and should be tailored to your workload. As a general rule:

    • Use strict_order when completeness is critical.
    • Use relaxed_order when latency is more important than recall.
    • Tune ef_search higher for complex filtering or larger graphs.

    Additionally, consider the following operational best practices:

    • Graviton4-based instances (R8g series) – These instances show excellent vector operation performance. Start with r8g.large for development and testing, and scale to r8g.2xlarge or 4xlarge for production workloads.
    • Balance memory and performance – Higher values of hnsw.ef_search provide more accurate results but consume more memory.
    • Index your filter columns – Create standard PostgreSQL indexes on columns used in WHERE clauses.
    • Monitor and tune – Use Amazon CloudWatch Database Insights to identify and optimize slow vector queries.
    • Consider partitioning for very large tables – For billions of vectors, table partitioning can improve both query performance and manageability.
    • Configure iterative scanning appropriately – Start with relaxed_order and adjust the threshold based on your application’s needs.

    Conclusion

    pgvector 0.8.0 on Aurora PostgreSQL-Compatible represents a significant advancement for organizations building production-scale AI applications. The introduction of iterative index scans solves one of the most challenging problems in vector search, and performance improvements across the board make Aurora PostgreSQL-Compatible an even more compelling option for vector storage and retrieval.As your vector data grows from thousands to millions or billions of embeddings, these optimizations make sure your applications remain responsive, accurate, and cost-effective.

    Ready to get started? Refer to Amazon Aurora resources or the Amazon Aurora User Guide to learn more about integrating pgvector 0.8.0 into your applications.


    About the authors

    Shayon Sanyal is a Principal WW Specialist Solutions Architect for Data and AI and a Subject Matter Expert for Amazon’s flagship relational database, Amazon Aurora. He has over 15 years of experience managing relational databases and analytics workloads. Shayon’s relentless dedication to customer success allows him to help customers design scalable, secure, and robust cloud-based architectures. Shayon also helps service teams with design and delivery of pioneering features, such as generative AI.

    Source: Read More

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleAutomate Amazon RDS for PostgreSQL major or minor version upgrade using AWS Systems Manager and Amazon EC2
    Next Article How Building a Banking App Is Like Making a Michelin-Star Meal

    Related Posts

    Security

    New Linux Flaws Allow Password Hash Theft via Core Dumps in Ubuntu, RHEL, Fedora

    June 1, 2025
    Security

    DevSecOps Phase 4B: Manual Penetration Testing

    June 1, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    How engineers can use one-on-ones with their manager to accelerate career growth

    News & Updates

    YouTube just spoiled one of 2025’s biggest games for me, so I downloaded this cool browser extension to make sure it never happens again

    News & Updates

    Consecutive Scroll Animations with One Element

    Development

    How to Build a Scalable URL Shortener with Distributed Caching Using Redis

    Development

    Highlights

    CVE-2023-4377 – Apache Struts Remote Code Execution Vulnerability

    April 30, 2025

    CVE ID : CVE-2023-4377

    Published : April 29, 2025, 11:15 p.m. | 3 hours, 52 minutes ago

    Description : Rejected reason: This CVE ID has been rejected or withdrawn by its CVE Numbering Authority.

    Severity: 0.0 | NA

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    Weekly JavaScript Roundup: Friday Links 13, December 20, 2024

    December 20, 2024

    What Makes Code Vulnerable – And How to Fix It

    April 21, 2025

    On Device Llama 3.1 with Core ML

    November 1, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.