Efficient vector similarity search has become a critical component for implementing semantic search, recommendation systems, and Retrieval Augmented Generation (RAG). Amazon Aurora PostgreSQL-Compatible Edition now supports pgvector 0.8.0, bringing significant improvements to vector search capabilities, making Aurora an even more compelling choice for AI-powered applications using PostgreSQL requiring semantic search and RAG.
In this post, we explore how pgvector 0.8.0 on Aurora PostgreSQL-Compatible delivers up to 9x faster query processing and 100x more relevant search results, addressing key scaling challenges that enterprise AI applications face when implementing vector search at scale.
pgvector 0.8.0 improvements
While vector databases have emerged as important infrastructure components, effective vector search is the mission-critical capability that powers semantic applications. As organizations scale their AI applications to process millions or billions of vectors, the limitations of earlier vector search implementations become apparent. pgvector 0.8.0 introduces several critical improvements that directly tackle these production challenges, particularly when working with filtered queries against large datasets:
- Performance improvements – pgvector 0.8.0 offers up to a 5.7x improvement in query performance for specific query patterns compared to version 0.7.4. These enhancements will be explored in more detail later in this post.
- Complete result sets – The new iterative_scan feature in 0.8.0 provides improved recall for filter queries that require an approximate nearest neighbor (ANN) index search, a critical improvement over previous versions that could return incomplete results.
- Enhanced query planning – Better cost estimation in 0.8.0 leads to more efficient execution paths, such as choosing a traditional index like a B-tree for a complex filtered search.
- Flexible performance tuning – The introduction of iterative_scan in two modes—relaxed_order and strict_order – provides tunable accuracy compared to performance trade-offs.
Challenges of overfiltering
To appreciate the significance of this release, it’s important to understand a fundamental challenge with vector search that many developers encounter when moving to production. In previous versions of pgvector, when you combined vector similarity search with traditional SQL filters, the filtering happened after the vector index scan completed. This approach led to a problem called overfiltering, where your query might return fewer results than expected, or even none at all. It also created performance and scalability issues, as the system would retrieve many vectors only to discard most during filtering.
Consider this scenario: You have an e-commerce service with millions of product embeddings. When searching for “summer dresses” with filters for “women’s clothing” and “size medium,” earlier versions of pgvector would follow these steps:
- Scan the vector index to find the nearest neighbors to “summer dresses.”
- Apply SQL filters like category = “women’s clothing” and size = “medium” to those neighbors.
- Return the remaining results, which could be too few or even empty, especially if the filters matched a small fraction of the data.
HNSW (Hierarchical Navigable Small World) is an indexing algorithm in pgvector that accelerates vector similarity searches. It creates a multi-layered graph structure where vectors are connected to their nearest neighbors, allowing for efficient navigation through the vector space. With an HNSW index using default search settings (hnsw.ef_search = 40), if only 10% of your data matched the filter, you’d get roughly four usable results, regardless of how many relevant vectors were actually stored.
Iterative index scans
pgvector 0.8.0 introduces iterative index scans, which significantly improve query reliability and performance in filtered vector searches. The process works as follows:
- Scan the vector index.
- Apply any filters (e.g., metadata conditions).
- Check if enough results meet both the vector similarity and filter criteria.
- If not, continue scanning incrementally until either the required number of matches is found or a configurable limit is reached.
This approach avoids prematurely stopping due to overly strict filters (a common problem in prior versions), reducing false negatives and improving performance by avoiding full rescans or returning too few results. It’s particularly valuable for production-grade vector search applications with complex filtering requirements.Let’s see this in action with a practical example that demonstrates the power of this new feature with Aurora PostgreSQL-Compatible.First, let’s create a table with sample product data:
Now, let’s imagine we have populated this table with tens of millions of product embeddings from various categories. When a user searches for products similar to “comfortable hiking boots” but wants only items from the outdoor gear category, they’d run a query like the following:
Before pgvector 0.8.0
With previous versions, if you had 10 million products but only 50,000 were outdoor gear in stock (0.5%), the default HNSW scan would likely return only a few results, missing many relevant products. The workarounds were suboptimal:
- Increase hnsw.ef_search to scan more vectors (hurting performance)
- Create separate indexes for each category (complex to maintain)
- Implement application-level paging (adding complexity)
With pgvector 0.8.0 on Aurora PostgreSQL
Let’s enable iterative scanning and see the difference:
Now, pgvector automatically continues scanning the index until it finds enough results to satisfy your query, making sure users see a complete and relevant set of results while maintaining performance. The threshold for “enough” is configurable: you can control how many tuples the system will scan before stopping. For HNSW indexes, this is governed by the hnsw.max_scan_tuples parameter, which defaults to 20,000. You can adjust this based on your dataset and performance goals:SET hnsw.max_scan_tuples = 20000;
This gives you fine-grained control over the trade-off between recall (percentage of relevant results that are actually found) and performance during filtered vector search.Note: When using relaxed_order, you may need to reorder the results afterward to ensure proper ordering, for example:
This forces a final reorder operation.
Configuration options for iterative scanning
pgvector 0.8.0 offers three modes for iterative scanning:
- off – Traditional behavior, no iterative scanning (default)
- strict_order – Iteratively scan while preserving exact distance ordering
- relaxed_order – Iteratively scan with approximate ordering (better performance)
For most production use cases, relaxed_order provides the best balance of performance and accuracy. This is because relaxed_order allows pgvector to prioritize speed by returning results as they’re discovered rather than sorting them perfectly. It significantly reduces query latency while typically maintaining 95-99% of result quality compared to strict ordering. In real-world applications where sub-second response times matter more than perfect ranking (like recommendation systems and semantic search), this trade-off delivers substantial performance gains with minimal practical impact on user experience. In addition to the hnsw.max_scan_tuples parameter, you can also configure the hnsw. scan_mem_multiplier parameter to improve recall. This parameter specifies the max amount of memory to use, as a multiple of work_mem (1 by default).
Scaling RAG applications on Aurora PostgreSQL-Compatible
Let’s consider how these improvements impact a real-world RAG application:Imagine an online marketplace with 10 million products, each represented by a 384-dimensional vector embedding generated from product descriptions. Customers can search across the entire catalog or filter by category, price range, or rating. With previous versions of pgvector, filtered searches might miss relevant products unless you carefully tuned parameters for each query pattern. With pgvector 0.8.0 on Aurora PostgreSQL-Compatible, the database automatically adjusts to produce complete results.To demonstrate the real-world impact of pgvector 0.8.0’s improvements, we conducted extensive benchmarking of both pgvector 0.7.4 and 0.8.0 on Aurora PostgreSQL-Compatible using realistic ecommerce workloads at production scale. Our tests focused on scenarios that businesses encounter when deploying large-scale product search systems.
Benchmark setup
We created a synthetic dataset of 10 million products with realistic ecommerce characteristics spanning multiple categories. To make these benchmarks reproducible, here’s how we generated the dataset:
Data Generation Process
- Product Metadata Generation: Using a Python script with libraries like faker and numpy, we generated realistic product metadata:
- Embedding Generation: We generated 384-dimensional embeddings using the all-MiniLM-L6-v2 SentenceTransformer model:
- Data Loading to PostgreSQL: We used PostgreSQL’s COPY command for efficient data loading:
The dataset we generated had the following characteristics:
- 10 million products across 9 categories with a realistic distribution
- 384-dimensional embeddings generated from product titles and descriptions
- 2% of products containing “smart” in the title for filtered query testing
- Natural language text generated using Faker to ensure variety and realistic content
Additionally, a B-tree index on the category column was included to optimize filter operations commonly used in vector similarity searches. This dataset mirrors what organizations build for comprehensive product search systems.This setup can be reproduced using the code snippets above, adjusting the scale as needed for your testing environment.For these tests, we used a product catalog schema:
We ran the following sample queries:
- Query A – Basic search (top 10):
- Query B – Large result set (top 1,000):
- Query C – Category-filtered search:
- Query D – Complex filtered search:
- Query E – Very large result set (10,000):
Testing methodology
The benchmark was designed to replicate real-world vector search scenarios while providing consistent measurements:
- Infrastructure – Two separate Aurora PostgreSQL clusters running on db.r8g.4xlarge instances (powered by AWS Graviton4 processors)
- Dataset – 10 million products with 384-dimensional embeddings
- Index configuration – HNSW indexes with identical parameters across tests for fair comparison
- Cache management – Buffer cache cleared between tests to provide consistent cold-start performance
- Query Runs – Queries A, B, and C were executed 100 times each, whereas the more intensive Queries D and E were run 20 and 5 times, respectively, with reported latency values representing the average across the runs to provide statistical significance and minimize the impact of outliers
- Test configurations – We used the following configurations:
- 0.7.4 baseline: ef_search=40
- 0.7.4: ef_search=200
- 0.8.0 baseline: ef_search=40, iterative_scan=off
- 0.8.0: ef_search=40, iterative_scan=strict_order
- 0.8.0: ef_search=40, iterative_scan=relaxed_order
- 0.8.0: ef_search=200, iterative_scan=strict_order
- 0.8.0: ef_search=200, iterative_scan=relaxed_order
Performance improvements
Our performance tests revealed significant improvements with pgvector 0.8.0 across the different query patterns. The following table shows p99 latency measurements (in milliseconds) for different configurations.
Query Type | 0.7.4 baseline (ef_search=40) | 0.7.4 (ef_search=200)) | 0.8.0 best config | Best configuration | Improvement |
A | 123.3 ms | 394.1 ms | 13.1 ms | ef_search=40, relaxed_order | 9.4x faster |
B | 104.2 ms | 341.4 ms | 83.5 ms | ef_search=200, relaxed_order | 1.25x faster |
C | 128.5 ms | 333.4 ms | 85.7 ms | ef_search=200, relaxed_order | 1.5x faster |
D | 127.4 ms | 318.6 ms | 70.7 ms | ef_search=200, relaxed_order | 1.8x faster |
E | 913.4 ms | 427.4 ms | 160.3 ms | ef_search=200, relaxed_order | 5.7x faster |
The performance improvements with pgvector 0.8.0 were substantial across the different query patterns, even at this 10-million product scale. For typical ecommerce queries that search within specific categories for products matching certain criteria, runtime dropped from over 120 milliseconds with pgvector 0.7.4 to just 70 milliseconds with 0.8.0, while returning more comprehensive results.What’s particularly impressive is how pgvector 0.8.0’s improved cost estimation capabilities automatically chose more efficient execution plans. In our filtered query tests, the planner correctly estimated costs and provided more realistic assessments of vector operations complexity. As demonstrated in the “Enhanced cost estimation and query planning” section below, pgvector 0.8.0’s cost estimates (7,224.63 cost units) more accurately reflect the actual computational demands of vector operations compared to version 0.7.4 (116.84 cost units), leading to better execution plan selections and more complete result sets.
Recall and result completeness enhancements
Perhaps more important than raw performance is the substantial improvement in result quality when working with millions of vectors. Our tests demonstrated significant differences in result completeness. Remember that recall means we return X out of Y expected results, with 100% being perfect recall:
Query | 0.7.4 baseline (ef_search=40) | 0.7.4 (ef_search=200) | 0.8.0 with strict_order | 0.8.0 with relaxed_order |
Category-filtered search | 10% | 0% | 100% | 100% |
Complex filtered search | 1% | 0% | 100% | 100% |
Very large result set | 5% | 5% | 100% | 100% |
For highly selective queries (products in a specific category), pgvector 0.7.4 returned only a fraction of requested results. With iterative scanning enabled in 0.8.0, we saw up to 100 times improvement in result completeness, substantially enhancing the user experience.The following is a query pattern we tested that demonstrates these improvements:
Different iterative scan modes and ef_search values
We conducted a detailed comparison of different pgvector 0.8.0 configurations to understand the trade-offs between different iterative scan modes and ef_search values.
Configuration | Query A (top 10) | Query B (top 1000) | Query C (filtered) | Query D (complex) | Query E (large) |
0.8.0 baseline (ef_search=40, iterative_scan=off) | 19.3 ms | 18.8 ms | 20.0 ms | 15.7 ms | 99.8 ms |
0.8.0 (ef_search=40, iterative_scan=strict_order) | 18.1 ms | 277.9 ms | 197.1 ms | 203.2 ms | 344.0 ms |
0.8.0 (ef_search=40, iterative_scan=relaxed_order) | 13.1 ms | 164.1 ms | 150.8 ms | 99.1 ms | 397.9 ms |
0.8.0 (ef_search=200, iterative_scan=strict_order) | 28.8 ms | 133.7 ms | 128.5 ms | 57.9 ms | 207.6 ms |
0.8.0 (ef_search=200, iterative_scan=relaxed_order) | 30.7 ms | 83.5 ms | 85.7 ms | 70.7 ms | 160.3 ms |
This detailed breakdown illustrates how different combinations affect performance across query types. For simple queries (A), a lower ef_search with relaxed_order provides the best performance. For complex filtered queries (C, D) and large result sets (B, E), higher ef_search values with relaxed_order typically offer the best balance of performance and completeness.The relaxed_order mode provides significantly better performance for most query types while still delivering complete result sets. For applications where exact distance ordering is less critical (like product recommendations), this mode offers an excellent balance of performance and results quality.
Enhanced cost estimation and query planning
Cost estimation in PostgreSQL refers to how the database predicts the computational resources (primarily CPU time and memory) required to execute a query. The query planner uses these cost estimates to determine the most efficient execution path.The query planning with pgvector 0.8.0 shows significant improvements in cost estimation accuracy and planning decisions. These enhancements enable PostgreSQL to make smarter choices about when to use vector indexes versus sequential scans, resulting in faster query execution, especially for complex queries combining vector similarity with traditional filters. To illustrate this, let’s examine the EXPLAIN output for a filtered query (Query C) from both versions.The following code is the pgvector 0.7.4 query plan (category filter):
The following code is the pgvector 0.8.0 query plan with iterative_scan=relaxed_order:
These query plans reveal several key improvements in 0.8.0:
Note: PostgreSQL cost units are arbitrary internal measurements that represent estimated CPU and I/O workload. They don’t directly translate to milliseconds or other standard units, but higher values indicate the planner’s expectation of more resource-intensive operations.
- More realistic startup costs – The 0.8.0 planner estimates a startup cost of 7,224.63 cost units versus only 116.84 cost units in 0.7.4, which much better reflects the actual computational complexity of vector operations
- Better row estimation – The 0.8.0 planner estimates 1,017,000 filtered rows compared to 987,333 in 0.7.4, showing a more accurate assessment of the filter’s selectivity
- Complete results – Most importantly, 0.8.0 returns the 10 requested rows, whereas 0.7.4 only found 6
- Efficient use of indexes – With the addition of a category index, both versions can efficiently filter results, but 0.8.0 is more thorough in its index traversal due to iterative scan
For complex filters (Query D), the differences are even more pronounced:The following code is the pgvector 0.7.4 query plan (complex filter):
The following code is the pgvector 0.8.0 query plan with iterative_scan=relaxed_order:
The key difference here is that whereas 0.7.4 stops after finding only 39 rows (despite requesting 100), the 0.8.0 planner with iterative scan continues searching until it finds the 100 requested rows, with even better runtime.These examples demonstrate how the improved cost estimation in pgvector 0.8.0 leads to better execution strategies, particularly when combining vector searches with traditional database filters. The more accurate cost model helps the PostgreSQL optimizer make smarter decisions about execution paths, resulting in both better performance and complete result sets.
Scaling to production workloads
The Amazon Aurora I/O-Optimized cluster configuration offers enhanced price-performance and predictable pricing for I/O-intensive workloads, including e-commerce services, payment processing systems, recommendation systems, and RAG applications. This configuration enhances I/O performance with Aurora Optimized Reads via improved buffer cache management increasing write throughput and lowering latency. For dynamic or variable workloads, Amazon Aurora Serverless v2 provides a production-ready, auto-scaling option that adjusts capacity in fine-grained increments – ideal for quick starts and elastic scaling without sacrificing performance or availability.
The ability of Aurora PostgreSQL-Compatible to scale read capacity through read replicas, combined with pgvector 0.8.0’s more efficient query processing, provides a robust foundation for enterprise-scale ecommerce applications. Businesses can now confidently build semantic search, recommendation systems, and RAG applications that maintain high performance and result quality even as their product catalogs grow into millions or even billions of vectors.
Semantic search systems
A semantic search use case might include product search, document retrieval, and content recommendation. 0.8.0 excels in the following ways:
- The noticeable speed improvements (up to 9.4 times faster for basic queries) allow for real-time search experiences
- relaxed_order mode is ideal for search interfaces where slight variations in result ordering aren’t perceptible to users
- Improved filtered queries (Queries C and D) enhance faceted or category-filtered search implementations
- Complete result sets make sure users see the most relevant items, unlike 0.7.4, which often missed key results
An example implementation might be ecommerce product search where users expect sub-second results with filtering by product attributes.
Large-scale recommendation systems
A recommendation use case might include content recommendation, “similar items” features, and personalization. 0.8.0 offers the following benefits:
- Much faster retrieval of larger result sets (Queries B and E) allows systems to fetch more candidates for postprocessing
- Lower latency enables real-time recommendations on high-traffic systems
- The performance on filtered queries supports contextual recommendations (for example, “similar products in this category”)
- Better recall delivers diversity in recommendations
An example implementation might be media streaming services that need to recommend thousands of content items from a catalog of millions in real time.
RAG applications
A RAG use case might include AI systems that retrieve relevant context before generating responses. 0.8.0 offers the following improvements:
- Lower latency improves end-to-end response time for AI systems
- Better performance on filtered queries enables domain-specific retrieval
- Complete result sets make sure the AI has access to the relevant context
- Relaxed ordering is ideal because RAG typically uses top-k retrieval where exact ordering isn’t critical
An example implementation might be enterprise AI assistants that need to query company knowledge bases to answer user questions.
Get started with pgvector 0.8.0 on Aurora PostgreSQL-Compatible
To start using pgvector 0.8.0, complete the following steps:
- Launch a new Aurora PostgreSQL cluster running versions 17.4, 16.8, 15.12, 14.17, or 13.20 and higher.
- Connect to your DB cluster.
- After connecting to your database, enable the extension:
CREATE EXTENSION IF NOT EXISTS vector;
- Confirm you’re running the latest version for pgvector:
Best Practices for pgvector 0.8.0 on Aurora PostgreSQL-Compatible
When deploying pgvector 0.8.0 in production, consider the following best practices to balance performance, recall, and filtering accuracy:
- If you don’t need a vector index, don’t use it – For 100% recall and good performance with smaller datasets, a sequential scan might be more appropriate than a vector index. Only use vector indexes when you need the performance benefits for large datasets.
For example, if you have a table with only 10,000 product embeddings, a sequential scan might actually be faster than using a vector index:
Creating vector indexes adds overhead for maintenance and storage, which only pays off when your dataset grows large enough that sequential scans become prohibitively expensive.
- Indexing recommendations
- Use HNSW with recommended parameters to ensure high search quality and efficient index construction:
-
- Create additional indexes on commonly filtered metadata columns (e.g., category, status, org_id) to improve performance of post-vector-filtering:
CREATE INDEX my_table_category_idx ON my_table(category);
- Query-time tuning (search parameters)
Depending on your use case, adjust these parameters to optimize for recall or performance:
-
- For maximum recall with filtering (such as, strict compliance or analytical use cases):
-
- For best performance (e.g., interactive or latency-sensitive workloads):
-
- For balanced scenarios (e.g., general-purpose retrieval):
These recommendations are domain-agnostic and should be tailored to your workload. As a general rule:
- Use strict_order when completeness is critical.
- Use relaxed_order when latency is more important than recall.
- Tune ef_search higher for complex filtering or larger graphs.
Additionally, consider the following operational best practices:
- Graviton4-based instances (R8g series) – These instances show excellent vector operation performance. Start with r8g.large for development and testing, and scale to r8g.2xlarge or 4xlarge for production workloads.
- Balance memory and performance – Higher values of hnsw.ef_search provide more accurate results but consume more memory.
- Index your filter columns – Create standard PostgreSQL indexes on columns used in WHERE clauses.
- Monitor and tune – Use Amazon CloudWatch Database Insights to identify and optimize slow vector queries.
- Consider partitioning for very large tables – For billions of vectors, table partitioning can improve both query performance and manageability.
- Configure iterative scanning appropriately – Start with relaxed_order and adjust the threshold based on your application’s needs.
Conclusion
pgvector 0.8.0 on Aurora PostgreSQL-Compatible represents a significant advancement for organizations building production-scale AI applications. The introduction of iterative index scans solves one of the most challenging problems in vector search, and performance improvements across the board make Aurora PostgreSQL-Compatible an even more compelling option for vector storage and retrieval.As your vector data grows from thousands to millions or billions of embeddings, these optimizations make sure your applications remain responsive, accurate, and cost-effective.
Ready to get started? Refer to Amazon Aurora resources or the Amazon Aurora User Guide to learn more about integrating pgvector 0.8.0 into your applications.
About the authors
Shayon Sanyal is a Principal WW Specialist Solutions Architect for Data and AI and a Subject Matter Expert for Amazon’s flagship relational database, Amazon Aurora. He has over 15 years of experience managing relational databases and analytics workloads. Shayon’s relentless dedication to customer success allows him to help customers design scalable, secure, and robust cloud-based architectures. Shayon also helps service teams with design and delivery of pioneering features, such as generative AI.
Source: Read More