Supercharging vector search performance and relevance with pgvector 0.8.0 on Amazon Aurora PostgreSQL

Efficient vector similarity search has become a critical component for implementing semantic search, recommendation systems, and Retrieval Augmented Generation (RAG). Amazon Aurora PostgreSQL-Compatible Edition now supports pgvector 0.8.0, bringing significant improvements to vector search capabilities, making Aurora an even more compelling choice for AI-powered applications using PostgreSQL requiring semantic search and RAG.

In this post, we explore how pgvector 0.8.0 on Aurora PostgreSQL-Compatible delivers up to 9x faster query processing and 100x more relevant search results, addressing key scaling challenges that enterprise AI applications face when implementing vector search at scale.

pgvector 0.8.0 improvements

While vector databases have emerged as important infrastructure components, effective vector search is the mission-critical capability that powers semantic applications. As organizations scale their AI applications to process millions or billions of vectors, the limitations of earlier vector search implementations become apparent. pgvector 0.8.0 introduces several critical improvements that directly tackle these production challenges, particularly when working with filtered queries against large datasets:

Performance improvements – pgvector 0.8.0 offers up to a 5.7x improvement in query performance for specific query patterns compared to version 0.7.4. These enhancements will be explored in more detail later in this post.
Complete result sets – The new iterative_scan feature in 0.8.0 provides improved recall for filter queries that require an approximate nearest neighbor (ANN) index search, a critical improvement over previous versions that could return incomplete results.
Enhanced query planning – Better cost estimation in 0.8.0 leads to more efficient execution paths, such as choosing a traditional index like a B-tree for a complex filtered search.
Flexible performance tuning – The introduction of iterative_scan in two modes—relaxed_order and strict_order – provides tunable accuracy compared to performance trade-offs.

Challenges of overfiltering

To appreciate the significance of this release, it’s important to understand a fundamental challenge with vector search that many developers encounter when moving to production. In previous versions of pgvector, when you combined vector similarity search with traditional SQL filters, the filtering happened after the vector index scan completed. This approach led to a problem called overfiltering, where your query might return fewer results than expected, or even none at all. It also created performance and scalability issues, as the system would retrieve many vectors only to discard most during filtering.

Consider this scenario: You have an e-commerce service with millions of product embeddings. When searching for “summer dresses” with filters for “women’s clothing” and “size medium,” earlier versions of pgvector would follow these steps:

Scan the vector index to find the nearest neighbors to “summer dresses.”
Apply SQL filters like category = “women’s clothing” and size = “medium” to those neighbors.
Return the remaining results, which could be too few or even empty, especially if the filters matched a small fraction of the data.

HNSW (Hierarchical Navigable Small World) is an indexing algorithm in pgvector that accelerates vector similarity searches. It creates a multi-layered graph structure where vectors are connected to their nearest neighbors, allowing for efficient navigation through the vector space. With an HNSW index using default search settings (hnsw.ef_search = 40), if only 10% of your data matched the filter, you’d get roughly four usable results, regardless of how many relevant vectors were actually stored.

Iterative index scans

pgvector 0.8.0 introduces iterative index scans, which significantly improve query reliability and performance in filtered vector searches. The process works as follows:

Scan the vector index.
Apply any filters (e.g., metadata conditions).
Check if enough results meet both the vector similarity and filter criteria.
If not, continue scanning incrementally until either the required number of matches is found or a configurable limit is reached.

This approach avoids prematurely stopping due to overly strict filters (a common problem in prior versions), reducing false negatives and improving performance by avoiding full rescans or returning too few results. It’s particularly valuable for production-grade vector search applications with complex filtering requirements.Let’s see this in action with a practical example that demonstrates the power of this new feature with Aurora PostgreSQL-Compatible.First, let’s create a table with sample product data:

CREATE TABLE products (
    id bigint GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    title TEXT,
    description TEXT,
    category TEXT,
    embedding VECTOR(384)
);

-- Create an index on the embedding column using HNSW
CREATE INDEX products_embedding_idx ON products USING hnsw (embedding vector_cosine_ops);

-- Also create an index on the category column for efficient filtering
CREATE INDEX ON products (category);

Now, let’s imagine we have populated this table with tens of millions of product embeddings from various categories. When a user searches for products similar to “comfortable hiking boots” but wants only items from the outdoor gear category, they’d run a query like the following:

SELECT 
    product_name, 
    category, 
    embedding <=> '[vector for "comfortable hiking boots"]' AS distance
FROM products 
WHERE category = 'outdoor gear'
ORDER BY distance
LIMIT 20;

Before pgvector 0.8.0

With previous versions, if you had 10 million products but only 50,000 were outdoor gear in stock (0.5%), the default HNSW scan would likely return only a few results, missing many relevant products. The workarounds were suboptimal:

Increase hnsw.ef_search to scan more vectors (hurting performance)
Create separate indexes for each category (complex to maintain)
Implement application-level paging (adding complexity)

With pgvector 0.8.0 on Aurora PostgreSQL

Let’s enable iterative scanning and see the difference:

-- Enable iterative scanning
SET hnsw.iterative_scan = 'relaxed_order';

-- Run the same query
SELECT 
    product_name, 
    category, 
    embedding <=> '[vector for "comfortable hiking boots"]' AS distance
FROM products 
WHERE category = 'outdoor gear'
ORDER BY distance
LIMIT 20;

Now, pgvector automatically continues scanning the index until it finds enough results to satisfy your query, making sure users see a complete and relevant set of results while maintaining performance. The threshold for “enough” is configurable: you can control how many tuples the system will scan before stopping. For HNSW indexes, this is governed by the hnsw.max_scan_tuples parameter, which defaults to 20,000. You can adjust this based on your dataset and performance goals:SET hnsw.max_scan_tuples = 20000;This gives you fine-grained control over the trade-off between recall (percentage of relevant results that are actually found) and performance during filtered vector search.Note: When using relaxed_order, you may need to reorder the results afterward to ensure proper ordering, for example:

SELECT * FROM (
  -- original query
) p ORDER BY p.distance * 1;

This forces a final reorder operation.

Configuration options for iterative scanning

pgvector 0.8.0 offers three modes for iterative scanning:

off – Traditional behavior, no iterative scanning (default)
strict_order – Iteratively scan while preserving exact distance ordering
relaxed_order – Iteratively scan with approximate ordering (better performance)

For most production use cases, relaxed_order provides the best balance of performance and accuracy. This is because relaxed_order allows pgvector to prioritize speed by returning results as they’re discovered rather than sorting them perfectly. It significantly reduces query latency while typically maintaining 95-99% of result quality compared to strict ordering. In real-world applications where sub-second response times matter more than perfect ranking (like recommendation systems and semantic search), this trade-off delivers substantial performance gains with minimal practical impact on user experience. In addition to the hnsw.max_scan_tuples parameter, you can also configure the hnsw. scan_mem_multiplier parameter to improve recall. This parameter specifies the max amount of memory to use, as a multiple of work_mem (1 by default).

Scaling RAG applications on Aurora PostgreSQL-Compatible

Let’s consider how these improvements impact a real-world RAG application:Imagine an online marketplace with 10 million products, each represented by a 384-dimensional vector embedding generated from product descriptions. Customers can search across the entire catalog or filter by category, price range, or rating. With previous versions of pgvector, filtered searches might miss relevant products unless you carefully tuned parameters for each query pattern. With pgvector 0.8.0 on Aurora PostgreSQL-Compatible, the database automatically adjusts to produce complete results.To demonstrate the real-world impact of pgvector 0.8.0’s improvements, we conducted extensive benchmarking of both pgvector 0.7.4 and 0.8.0 on Aurora PostgreSQL-Compatible using realistic ecommerce workloads at production scale. Our tests focused on scenarios that businesses encounter when deploying large-scale product search systems.

Benchmark setup

We created a synthetic dataset of 10 million products with realistic ecommerce characteristics spanning multiple categories. To make these benchmarks reproducible, here’s how we generated the dataset:

Data Generation Process

Product Metadata Generation: Using a Python script with libraries like faker and numpy, we generated realistic product metadata:

import pandas as pd
import numpy as np
from faker import Faker
from sentence_transformers import SentenceTransformer
import random
# Initialize faker for generating realistic text
fake = Faker()
# Initialize the sentence transformer model
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
# Define product categories with realistic distribution
categories = {
    'electronics': 0.20,  # 20% of products
    'clothing': 0.25,
    'home_goods': 0.15,
    'beauty': 0.10,
    'books': 0.05,
    'toys': 0.05,
    'sports': 0.05,
    'grocery': 0.10,
    'office': 0.05
}
# Generate synthetic product data
num_products = 10_000_000  # 10 million products
products = []
for i in range(num_products):
    # Select category based on distribution
    category = np.random.choice(
        list(categories.keys()), 
        p=list(categories.values())
    )
    
    # Generate title with category context
    title = fake.catch_phrase()
    
    # More detailed description
    description = fake.paragraph(nb_sentences=3)
    
    # Add some "smart" products for testing Query D
    if i % 50 == 0:  # 2% of products have "smart" in the title
        title = "Smart " + title
        
    products.append({
        'id': i+1,
        'title': title,
        'description': description,
        'category': category,
        # Embedding will be added in the next step
    })
    
    # Print progress
    if i % 100000 == 0:
        print(f"Generated {i} products")
# Convert to DataFrame
df = pd.DataFrame(products)

Embedding Generation: We generated 384-dimensional embeddings using the all-MiniLM-L6-v2 SentenceTransformer model:

# Generate embeddings in batches to manage memory
batch_size = 1000
for i in range(0, len(df), batch_size):
    end = min(i + batch_size, len(df))
    batch = df.iloc[i:end]
    
    # Generate embeddings from title + description
    texts = [f"{row.title}. {row.description}" for _, row in batch.iterrows()]
    embeddings = model.encode(texts)
    
    # Store embeddings
    for j, embedding in enumerate(embeddings):
        df.at[i+j, 'embedding'] = embedding.tolist()
    
    print(f"Generated embeddings for products {i} to {end}")

Data Loading to PostgreSQL: We used PostgreSQL’s COPY command for efficient data loading:

# Export data to CSV file (without embeddings column for faster export)
csv_file = "product_metadata.csv"
df[['id', 'title', 'description', 'category']].to_csv(csv_file, index=False)
# Export embeddings separately as binary file for efficient loading
embedding_file = "product_embeddings.binary"
with open(embedding_file, 'wb') as f:
    for _, row in df.iterrows():
        embedding = np.array(row['embedding'], dtype=np.float32)
        f.write(embedding.tobytes())
-- SQL to load data
CREATE TABLE products (
    id bigint GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    title TEXT,
    description TEXT,
    category TEXT,
    embedding vector(384)
);
-- Load metadata using COPY
COPY products (id, title, description, category)
FROM '/path/to/product_metadata.csv' 
CSV HEADER;
-- Load embeddings using custom function that reads binary data
SELECT load_embeddings('/path/to/product_embeddings.binary', 'products', 'embedding', 384);

The dataset we generated had the following characteristics:

10 million products across 9 categories with a realistic distribution
384-dimensional embeddings generated from product titles and descriptions
2% of products containing “smart” in the title for filtered query testing
Natural language text generated using Faker to ensure variety and realistic content

Additionally, a B-tree index on the category column was included to optimize filter operations commonly used in vector similarity searches. This dataset mirrors what organizations build for comprehensive product search systems.This setup can be reproduced using the code snippets above, adjusting the scale as needed for your testing environment.For these tests, we used a product catalog schema:

CREATE TABLE products (
    id bigint GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    title TEXT,
    description TEXT,
    category TEXT,
    embedding vector(384)
);

-- Create HNSW index for vector similarity search
CREATE INDEX products_embedding_idx ON products 
    USING hnsw (embedding vector_cosine_ops);

-- Create index on category for efficient filtering
CREATE INDEX ON products (category);

We ran the following sample queries:

Query A – Basic search (top 10):

SELECT 
    id, 
    title, 
    description, 
    embedding <=> %s::vector AS distance
FROM products
ORDER BY distance
LIMIT 10;

Query B – Large result set (top 1,000):

SELECT 
    id, 
    title, 
    description, 
    embedding <=> %s::vector AS distance
FROM products
ORDER BY distance
LIMIT 1000;

Query C – Category-filtered search:

SELECT 
    id, 
    title, 
    description, 
    embedding <=> %s::vector AS distance
FROM products
WHERE category = 'category1'
ORDER BY distance
LIMIT 10;

Query D – Complex filtered search:

SELECT 
    id, 
    title, 
    description, 
    category, 
    embedding <=> %s::vector AS distance
FROM products
WHERE category IN ('category1', 'category2', 'category3')
AND title ILIKE '%%smart%%'  -- Filters for titles containing 'smart'
ORDER BY distance
LIMIT 100;

Query E – Very large result set (10,000):

SELECT 
    id, 
    title, 
    description, 
    category, embedding <=> %s::vector AS distance
FROM products
ORDER BY distance
LIMIT 10000;

Testing methodology

The benchmark was designed to replicate real-world vector search scenarios while providing consistent measurements:

Infrastructure – Two separate Aurora PostgreSQL clusters running on db.r8g.4xlarge instances (powered by AWS Graviton4 processors)
Dataset – 10 million products with 384-dimensional embeddings
Index configuration – HNSW indexes with identical parameters across tests for fair comparison
Cache management – Buffer cache cleared between tests to provide consistent cold-start performance
Query Runs – Queries A, B, and C were executed 100 times each, whereas the more intensive Queries D and E were run 20 and 5 times, respectively, with reported latency values representing the average across the runs to provide statistical significance and minimize the impact of outliers
Test configurations – We used the following configurations:
- 0.7.4 baseline: ef_search=40
- 0.7.4: ef_search=200
- 0.8.0 baseline: ef_search=40, iterative_scan=off
- 0.8.0: ef_search=40, iterative_scan=strict_order
- 0.8.0: ef_search=40, iterative_scan=relaxed_order
- 0.8.0: ef_search=200, iterative_scan=strict_order
- 0.8.0: ef_search=200, iterative_scan=relaxed_order

Performance improvements

Our performance tests revealed significant improvements with pgvector 0.8.0 across the different query patterns. The following table shows p99 latency measurements (in milliseconds) for different configurations.

Query Type	0.7.4 baseline (ef_search=40)	0.7.4 (ef_search=200))	0.8.0 best config	Best configuration	Improvement
A	123.3 ms	394.1 ms	13.1 ms	ef_search=40, relaxed_order	9.4x faster
B	104.2 ms	341.4 ms	83.5 ms	ef_search=200, relaxed_order	1.25x faster
C	128.5 ms	333.4 ms	85.7 ms	ef_search=200, relaxed_order	1.5x faster
D	127.4 ms	318.6 ms	70.7 ms	ef_search=200, relaxed_order	1.8x faster
E	913.4 ms	427.4 ms	160.3 ms	ef_search=200, relaxed_order	5.7x faster

The performance improvements with pgvector 0.8.0 were substantial across the different query patterns, even at this 10-million product scale. For typical ecommerce queries that search within specific categories for products matching certain criteria, runtime dropped from over 120 milliseconds with pgvector 0.7.4 to just 70 milliseconds with 0.8.0, while returning more comprehensive results.What’s particularly impressive is how pgvector 0.8.0’s improved cost estimation capabilities automatically chose more efficient execution plans. In our filtered query tests, the planner correctly estimated costs and provided more realistic assessments of vector operations complexity. As demonstrated in the “Enhanced cost estimation and query planning” section below, pgvector 0.8.0’s cost estimates (7,224.63 cost units) more accurately reflect the actual computational demands of vector operations compared to version 0.7.4 (116.84 cost units), leading to better execution plan selections and more complete result sets.

Recall and result completeness enhancements

Perhaps more important than raw performance is the substantial improvement in result quality when working with millions of vectors. Our tests demonstrated significant differences in result completeness. Remember that recall means we return X out of Y expected results, with 100% being perfect recall:

Query	0.7.4 baseline (ef_search=40)	0.7.4 (ef_search=200)	0.8.0 with strict_order	0.8.0 with relaxed_order
Category-filtered search	10%	0%	100%	100%
Complex filtered search	1%	0%	100%	100%
Very large result set	5%	5%	100%	100%

For highly selective queries (products in a specific category), pgvector 0.7.4 returned only a fraction of requested results. With iterative scanning enabled in 0.8.0, we saw up to 100 times improvement in result completeness, substantially enhancing the user experience.The following is a query pattern we tested that demonstrates these improvements:

-- Enable iterative scanning (pgvector 0.8.0 only)
SET hnsw.iterative_scan = 'relaxed_order';

-- e-commerce query with multiple filters
SELECT id, title, category, embedding <=> '[query vector]'::vector AS distance
FROM products
WHERE category = 'category1'
ORDER BY distance
LIMIT 100;

Different iterative scan modes and ef_search values

We conducted a detailed comparison of different pgvector 0.8.0 configurations to understand the trade-offs between different iterative scan modes and ef_search values.

Configuration	Query A (top 10)	Query B (top 1000)	Query C (filtered)	Query D (complex)	Query E (large)
0.8.0 baseline (ef_search=40, iterative_scan=off)	19.3 ms	18.8 ms	20.0 ms	15.7 ms	99.8 ms
0.8.0 (ef_search=40, iterative_scan=strict_order)	18.1 ms	277.9 ms	197.1 ms	203.2 ms	344.0 ms
0.8.0 (ef_search=40, iterative_scan=relaxed_order)	13.1 ms	164.1 ms	150.8 ms	99.1 ms	397.9 ms
0.8.0 (ef_search=200, iterative_scan=strict_order)	28.8 ms	133.7 ms	128.5 ms	57.9 ms	207.6 ms
0.8.0 (ef_search=200, iterative_scan=relaxed_order)	30.7 ms	83.5 ms	85.7 ms	70.7 ms	160.3 ms

This detailed breakdown illustrates how different combinations affect performance across query types. For simple queries (A), a lower ef_search with relaxed_order provides the best performance. For complex filtered queries (C, D) and large result sets (B, E), higher ef_search values with relaxed_order typically offer the best balance of performance and completeness.The relaxed_order mode provides significantly better performance for most query types while still delivering complete result sets. For applications where exact distance ordering is less critical (like product recommendations), this mode offers an excellent balance of performance and results quality.

Enhanced cost estimation and query planning

Cost estimation in PostgreSQL refers to how the database predicts the computational resources (primarily CPU time and memory) required to execute a query. The query planner uses these cost estimates to determine the most efficient execution path.The query planning with pgvector 0.8.0 shows significant improvements in cost estimation accuracy and planning decisions. These enhancements enable PostgreSQL to make smarter choices about when to use vector indexes versus sequential scans, resulting in faster query execution, especially for complex queries combining vector similarity with traditional filters. To illustrate this, let’s examine the EXPLAIN output for a filtered query (Query C) from both versions.The following code is the pgvector 0.7.4 query plan (category filter):

Limit  (cost=116.84..217.48 rows=10 width=90) (actual time=3.919..4.088 rows=6 loops=1)
  ->  Index Scan using products_embedding_idx on products  (cost=116.84..10127632.67 rows=987333 width=90) (actual time=3.919..4.087 rows=6 loops=1)
        Order By: (embedding <=> '[vector]'::vector)
        Filter: (category = 'category1'::text)
        Rows Removed by Filter: 494
Planning Time: 0.170 ms
Execution Time: 4.105 ms

The following code is the pgvector 0.8.0 query plan with iterative_scan=relaxed_order:

Limit  (cost=7224.63..7423.21 rows=10 width=90) (actual time=5.554..7.506 rows=10 loops=1)
  ->  Index Scan using products_embedding_idx on products  (cost=7224.63..20202546.50 rows=1017000 width=90) (actual time=5.554..7.504 rows=10 loops=1)
        Order By: (embedding <=> '[vector]'::vector)
        Filter: (category = 'category1'::text)
        Rows Removed by Filter: 570
Planning Time: 0.177 ms
Execution Time: 7.524 ms

These query plans reveal several key improvements in 0.8.0:

Note: PostgreSQL cost units are arbitrary internal measurements that represent estimated CPU and I/O workload. They don’t directly translate to milliseconds or other standard units, but higher values indicate the planner’s expectation of more resource-intensive operations.

More realistic startup costs – The 0.8.0 planner estimates a startup cost of 7,224.63 cost units versus only 116.84 cost units in 0.7.4, which much better reflects the actual computational complexity of vector operations
Better row estimation – The 0.8.0 planner estimates 1,017,000 filtered rows compared to 987,333 in 0.7.4, showing a more accurate assessment of the filter’s selectivity
Complete results – Most importantly, 0.8.0 returns the 10 requested rows, whereas 0.7.4 only found 6
Efficient use of indexes – With the addition of a category index, both versions can efficiently filter results, but 0.8.0 is more thorough in its index traversal due to iterative scan

For complex filters (Query D), the differences are even more pronounced:The following code is the pgvector 0.7.4 query plan (complex filter):

Limit  (cost=116.84..455.46 rows=100 width=100) (actual time=4.130..4.669 rows=39 loops=1)
  ->  Index Scan using products_embedding_idx on products  (cost=116.84..10170125.25 rows=2993034 width=100) (actual time=4.129..4.665 rows=39 loops=1)
        Order By: (embedding <=> '[vector]'::vector)
        Filter: ((title ~~* '%smart%'::text) AND (category = ANY ('{category1,category2,category3}'::text[])))
        Rows Removed by Filter: 461
Planning Time: 1.378 ms
Execution Time: 4.692 ms

The following code is the pgvector 0.8.0 query plan with iterative_scan=relaxed_order:

Limit  (cost=7224.63..7884.49 rows=100 width=100) (actual time=2.909..3.060 rows=100 loops=1)
  ->  Index Scan using products_embedding_idx on products  (cost=7224.63..20245171.57 rows=3067027 width=100) (actual time=2.909..3.053 rows=100 loops=1)
        Order By: (embedding <=> '[vector]'::vector)
        Filter: ((title ~~* '%smart%'::text) AND (category = ANY ('{category1,category2,category3}'::text[])))
        Rows Removed by Filter: 1
Planning Time: 1.508 ms
Execution Time: 3.083 ms

The key difference here is that whereas 0.7.4 stops after finding only 39 rows (despite requesting 100), the 0.8.0 planner with iterative scan continues searching until it finds the 100 requested rows, with even better runtime.These examples demonstrate how the improved cost estimation in pgvector 0.8.0 leads to better execution strategies, particularly when combining vector searches with traditional database filters. The more accurate cost model helps the PostgreSQL optimizer make smarter decisions about execution paths, resulting in both better performance and complete result sets.

Scaling to production workloads

The Amazon Aurora I/O-Optimized cluster configuration offers enhanced price-performance and predictable pricing for I/O-intensive workloads, including e-commerce services, payment processing systems, recommendation systems, and RAG applications. This configuration enhances I/O performance with Aurora Optimized Reads via improved buffer cache management increasing write throughput and lowering latency. For dynamic or variable workloads, Amazon Aurora Serverless v2 provides a production-ready, auto-scaling option that adjusts capacity in fine-grained increments – ideal for quick starts and elastic scaling without sacrificing performance or availability.

The ability of Aurora PostgreSQL-Compatible to scale read capacity through read replicas, combined with pgvector 0.8.0’s more efficient query processing, provides a robust foundation for enterprise-scale ecommerce applications. Businesses can now confidently build semantic search, recommendation systems, and RAG applications that maintain high performance and result quality even as their product catalogs grow into millions or even billions of vectors.

Semantic search systems

A semantic search use case might include product search, document retrieval, and content recommendation. 0.8.0 excels in the following ways:

The noticeable speed improvements (up to 9.4 times faster for basic queries) allow for real-time search experiences
relaxed_order mode is ideal for search interfaces where slight variations in result ordering aren’t perceptible to users
Improved filtered queries (Queries C and D) enhance faceted or category-filtered search implementations
Complete result sets make sure users see the most relevant items, unlike 0.7.4, which often missed key results

An example implementation might be ecommerce product search where users expect sub-second results with filtering by product attributes.

Large-scale recommendation systems

A recommendation use case might include content recommendation, “similar items” features, and personalization. 0.8.0 offers the following benefits:

Much faster retrieval of larger result sets (Queries B and E) allows systems to fetch more candidates for postprocessing
Lower latency enables real-time recommendations on high-traffic systems
The performance on filtered queries supports contextual recommendations (for example, “similar products in this category”)
Better recall delivers diversity in recommendations

An example implementation might be media streaming services that need to recommend thousands of content items from a catalog of millions in real time.

RAG applications

A RAG use case might include AI systems that retrieve relevant context before generating responses. 0.8.0 offers the following improvements:

Lower latency improves end-to-end response time for AI systems
Better performance on filtered queries enables domain-specific retrieval
Complete result sets make sure the AI has access to the relevant context
Relaxed ordering is ideal because RAG typically uses top-k retrieval where exact ordering isn’t critical

An example implementation might be enterprise AI assistants that need to query company knowledge bases to answer user questions.

Get started with pgvector 0.8.0 on Aurora PostgreSQL-Compatible

To start using pgvector 0.8.0, complete the following steps:

Launch a new Aurora PostgreSQL cluster running versions 17.4, 16.8, 15.12, 14.17, or 13.20 and higher.
Connect to your DB cluster.
After connecting to your database, enable the extension:

CREATE EXTENSION IF NOT EXISTS vector;

Confirm you’re running the latest version for pgvector:

postgres=> SELECT extversion FROM pg_extension WHERE extname = 'vector';
 extversion 
------------
 0.8.0
(1 row)

Best Practices for pgvector 0.8.0 on Aurora PostgreSQL-Compatible

When deploying pgvector 0.8.0 in production, consider the following best practices to balance performance, recall, and filtering accuracy:

If you don’t need a vector index, don’t use it – For 100% recall and good performance with smaller datasets, a sequential scan might be more appropriate than a vector index. Only use vector indexes when you need the performance benefits for large datasets.

For example, if you have a table with only 10,000 product embeddings, a sequential scan might actually be faster than using a vector index:

      -- For small tables, a simple sequential scan often performs better
      -- than maintaining indexes, especially if recall is critical
      -- This query does a sequential scan with cosine distance
      SELECT 
         id, 
         title, 
         embedding <=> '[query vector]'::vector AS distance
      FROM small_products_table
      ORDER BY distance
      LIMIT 20;
      -- Compare EXPLAIN ANALYZE output of this vs. the same query with an index
      -- You'll often find that for small tables (10K-50K vectors), 
      -- sequential scans provide better recall with minimal performance impact

Creating vector indexes adds overhead for maintenance and storage, which only pays off when your dataset grows large enough that sequential scans become prohibitively expensive.

Indexing recommendations
1. Use HNSW with recommended parameters to ensure high search quality and efficient index construction:

CREATE INDEX my_table_embedding_hnsw_idx
  ON my_table USING hnsw (embedding vector_cosine_ops)
  WITH (
      ef_construction = 128,
      m = 16
  );

1. Create additional indexes on commonly filtered metadata columns (e.g., category, status, org_id) to improve performance of post-vector-filtering:

CREATE INDEX my_table_category_idx ON my_table(category);

Query-time tuning (search parameters)

Depending on your use case, adjust these parameters to optimize for recall or performance:

1. For maximum recall with filtering (such as, strict compliance or analytical use cases):

SET hnsw.iterative_scan = 'strict_order';  -- ensures completeness
SET hnsw.ef_search = 200;                  -- deeper graph traversal

1. For best performance (e.g., interactive or latency-sensitive workloads):

SET hnsw.iterative_scan = 'relaxed_order'; -- returns results faster
SET hnsw.ef_search = 40;                   -- faster traversal

1. For balanced scenarios (e.g., general-purpose retrieval):

SET hnsw.iterative_scan = 'relaxed_order'; -- enables iterative scan fallback
SET hnsw.ef_search = 200;                  -- configurable based on query complexity

These recommendations are domain-agnostic and should be tailored to your workload. As a general rule:

Use strict_order when completeness is critical.
Use relaxed_order when latency is more important than recall.
Tune ef_search higher for complex filtering or larger graphs.

Additionally, consider the following operational best practices:

Graviton4-based instances (R8g series) – These instances show excellent vector operation performance. Start with r8g.large for development and testing, and scale to r8g.2xlarge or 4xlarge for production workloads.
Balance memory and performance – Higher values of hnsw.ef_search provide more accurate results but consume more memory.
Index your filter columns – Create standard PostgreSQL indexes on columns used in WHERE clauses.
Monitor and tune – Use Amazon CloudWatch Database Insights to identify and optimize slow vector queries.
Consider partitioning for very large tables – For billions of vectors, table partitioning can improve both query performance and manageability.
Configure iterative scanning appropriately – Start with relaxed_order and adjust the threshold based on your application’s needs.

Conclusion

pgvector 0.8.0 on Aurora PostgreSQL-Compatible represents a significant advancement for organizations building production-scale AI applications. The introduction of iterative index scans solves one of the most challenging problems in vector search, and performance improvements across the board make Aurora PostgreSQL-Compatible an even more compelling option for vector storage and retrieval.As your vector data grows from thousands to millions or billions of embeddings, these optimizations make sure your applications remain responsive, accurate, and cost-effective.

Ready to get started? Refer to Amazon Aurora resources or the Amazon Aurora User Guide to learn more about integrating pgvector 0.8.0 into your applications.

About the authors

Shayon Sanyal is a Principal WW Specialist Solutions Architect for Data and AI and a Subject Matter Expert for Amazon’s flagship relational database, Amazon Aurora. He has over 15 years of experience managing relational databases and analytics workloads. Shayon’s relentless dedication to customer success allows him to help customers design scalable, secure, and robust cloud-based architectures. Shayon also helps service teams with design and delivery of pioneering features, such as generative AI.

Source: Read More