Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 14, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 14, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 14, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 14, 2025

      I test a lot of AI coding tools, and this stunning new OpenAI release just saved me days of work

      May 14, 2025

      How to use your Android phone as a webcam when your laptop’s default won’t cut it

      May 14, 2025

      The 5 most customizable Linux desktop environments – when you want it your way

      May 14, 2025

      Gen AI use at work saps our motivation even as it boosts productivity, new research shows

      May 14, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Strategic Cloud Partner: Key to Business Success, Not Just Tech

      May 14, 2025
      Recent

      Strategic Cloud Partner: Key to Business Success, Not Just Tech

      May 14, 2025

      Perficient’s “What If? So What?” Podcast Wins Gold at the 2025 Hermes Creative Awards

      May 14, 2025

      PIM for Azure Resources

      May 14, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Windows 11 24H2’s Settings now bundles FAQs section to tell you more about your system

      May 14, 2025
      Recent

      Windows 11 24H2’s Settings now bundles FAQs section to tell you more about your system

      May 14, 2025

      You can now share an app/browser window with Copilot Vision to help you with different tasks

      May 14, 2025

      Microsoft will gradually retire SharePoint Alerts over the next two years

      May 14, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Databases»Key considerations when choosing a database for your generative AI applications

    Key considerations when choosing a database for your generative AI applications

    July 26, 2024

    On January 4, 2024, CMU professor Andy Pavlo, known for his database lectures, published his 2023 database review, primarily focusing on the rise of vector databases. These innovative data storage solutions have taken center stage. As the popularity of generative artificial intelligence (AI) models continues to soar, the spotlight has shifted to include databases with vector storage and search capabilities, which provide a cost-effective mechanism for extending the capabilities of a foundation model (FM). With advancements in distributed computing, cloud-centered architectures, and specialized hardware accelerators, databases with vector search are likely to become even more powerful and scalable.

    In this post, we explore the key factors to consider when selecting a database for your generative AI applications. We focus on high-level considerations and service characteristics that are relevant to fully managed databases with vector search capabilities currently available on AWS. We examine how these databases differ in terms of their behavior and performance, and provide guidance on how to make an informed decision based on your specific requirements. By understanding these essential aspects, you will be well-equipped to choose the most suitable database for your generative AI workloads, achieving optimal performance, scalability, and ease of implementation.

    Retrieval Augmented Generation

    Retrieval Augmented Generation (RAG) is the process of optimizing the output of a large language model (LLM), so it references an authoritative knowledge base outside of its training data sources before generating a response. Databases extend the already powerful capabilities of LLMs to specific domains or an organization’s internal knowledge base, all without the need to retrain the model. It’s a cost-effective approach to improving LLM output so it remains relevant, accurate, and useful in various contexts.

    The following diagram illustrates the RAG workflow.

    The key steps in the RAG workflow are as follows:

    The data source is processed to create document chunks, which are then passed through an embeddings model (such as Amazon Titan Text Embeddings V2) to convert the text into numerical representations called embeddings. These embeddings capture the semantic meaning of the text and are stored in a database optimized for vector search, along with the original document chunks.
    The user input is converted into an embedding using the same embeddings model. A semantic search is performed on the database using the user input embedding as the query vector, retrieving the top-k most relevant document chunks based on their proximity in the vector space. The retrieved chunks serve as context for the subsequent generation step.
    The user input is used as a prompt, and the retrieved document chunks are used for prompt augmentation. The augmented prompt is fed into an FM (such as Anthropic Claude 3 on Amazon Bedrock), which generates a response based on its pre-trained knowledge and the provided context from the database. The generated response is more informed and contextually appropriate due to the retrieved information.

    When selecting a database for RAG, one of the first considerations is deciding which database is right for your use case. The discussion around databases and generative AI has been vast and multifaceted, and this post seeks to simplify this discussion, focusing primarily on the decision-making process of which database with vector search capabilities is right for you. If you’re seeking guidance on LLMs, refer to Generative AI with LLMs.

    Vector search on AWS

    As of publishing this post, AWS offers the following databases with the full suite of vector capabilities, including vector storage, retrieval, index, and search. AWS also offers databases that have integration with Knowledge Bases for Amazon Bedrock.

    Amazon Aurora PostgreSQL-Compatible Edition and Amazon Relational Database Service (Amazon RDS) for PostgreSQL are fully managed relational databases with the pgvector open source vector search extension. Aurora PostgreSQL also supports vector search with its Amazon Aurora Serverless v2 deployment
    Amazon OpenSearch Service is a fully managed service for running OpenSearch, an open source search engine and analytics suite. OpenSearch supports vector search through its vector engine for both managed clusters and serverless collections. OpenSearch Service also supports vector search with its Amazon OpenSearch Serverless deployment option.
    Amazon Neptune Analytics is an analytics graph database engine with graph algorithms and vector search for combining graphs with RAG, such as GraphRAG.
    Vector search for Amazon DocumentDB (with MongoDB compatibility) is available with Amazon DocumentDB 5.0 instance-based clusters.
    Vector search for Amazon MemoryDB is an in-memory database providing the fastest vector search performance at the highest recall rates among popular vector databases on AWS. MemoryDB offers single-digit millisecond vector search and update latencies at even the highest levels of recall.
    Amazon DynamoDB zero-ETL integration with OpenSearch Service provides advanced search capabilities such as full-text search and vector search on your Amazon DynamoDB.

    In the following sections, we explore key considerations that can help you choose the right database for your generative AI applications:

    Familiarity
    Ease of implementation
    Scalability
    Performance

    Familiarity

    Opting for familiar technology, when possible, will ultimately save developer hours and reduce complexity. When developer teams are already familiar with a particular database engine, using the same database engine for vector search leverages existing knowledge for a streamlined experience. Instead of learning a new skillset, developers can utilize their current skills, tools, frameworks, and processes to include a new feature of an existing database engine.

    For example, a team of database engineers may already manage a set of 100 relational databases hosted on Aurora PostgreSQL. If they are exploring supporting a new database with vector search requirement for their applications, they should first start with evaluating the pgvector extension on their existing Aurora PostgreSQL databases.

    Similarly, if the team is working with graph data, they can consider using Neptune Analytics, which seamlessly integrates with their existing AWS infrastructure and provides powerful graph querying and visualization features.

    In cases where the team deals with JSON documents and needs a scalable, fully managed document database, Amazon DocumentDB offers a compatible and familiar MongoDB experience, allowing them to use their existing skills and knowledge.

    If the team is experienced with Redis OSS and needs a highly scalable, in-memory database for real-time applications, consider using MemoryDB. MemoryDB provides a familiar Redis OSS-compatible interface, allowing the team to use their existing Redis OSS knowledge and client libraries while benefiting from the fully managed, durable, and scalable capabilities of MemoryDB.

    In these examples, the database engineering team doesn’t need to onboard a new database software, which would include adding more developer tooling and integrations. Instead, they can focus on enabling new capabilities within their domain of expertise. Development best practices, operations, and query languages will remain the same, reducing the number of new variables in the equation of successful outcomes. Another benefit of using your existing database is that it aligns with the application’s requirements for security, availability, scalability, reliability, and performance. Moreover, your database administrators can use familiar technology, skills, and programming tools.

    If your current tech stack lacks vector search support, you can take advantage of serverless offerings to help fill the gap in your vector search needs. For example, OpenSearch Serverless with a quick create experience on the Amazon Bedrock console lets you get started without having to create or manage a cluster. Although onboarding a new technology is inevitable in this case, opting for serverless minimizes the management overhead.

    Ease of implementation

    Beyond familiarity, the expected process to actually implement a given database is the next primary consideration in the evaluation process. A seamless integration process can minimize disruption and accelerate time to value for your database. In this section, we explore how to evaluate databases across some core implementation focus areas such as vectorization, management, access control, compliance, and interface.

    Vectorization

    The foremost consideration for implementation is the process of populating your database with vector embeddings. If your data isn’t represented as a vector, you’ll need to use an embedding model to convert it into vectors and store it into a database enabled for vectors. For example, when you use OpenSearch Service as a knowledge base in Amazon Bedrock, your generative AI application can take unstructured data stored in Amazon Simple Storage Service (Amazon S3), convert it to text chunks and vectors, and store it in OpenSearch Service automatically.

    Management

    The day-to-day management needs to be a consideration for overall implementation. You need to select a database that won’t overburden your existing team’s database management workload. For example, instead of taking on the management overhead of a self-managed database on Amazon Elastic Compute Cloud (Amazon EC2), you can opt for a serverless database, such as Aurora Serverless v2 or OpenSearch Serverless, that supports vector search.

    Access control

    Access control is a critical consideration when integrating vector search into your existing infrastructure. In order to adhere to your current security standards, you should thoroughly evaluate the access control mechanisms offered by potential databases. For instance, if you have robust role-based access control (RBAC) for your non-vector Amazon DocumentDB implementation, choosing Amazon DocumentDB for vector search is ideal because it already aligns with your established access control requirements.

    Compliance

    Compliance certifications are key evaluation criteria for a chosen database. If a database doesn’t meet essential compliance needs for your application, it’s a non-starter. AWS is backed by a deep set of cloud security tools, with over 300 security, compliance, and governance services and features. In addition, AWS supports 143 security standards and compliance certifications, including PCI-DSS, HIPAA/HITECH, FedRAMP, GDPR, FIPS 140-2, and NIST 800-171, helping adhere to compliance requirements for virtually every regulatory agency around the globe, making sure your databases can meet your security and compliance needs.

    Interface

    How your generative AI application will interact with your database is another implementation consideration. This may have implications for the general usability of your database. You need to evaluate how you will connect to and interact with the database, choosing an option with a simple, intuitive interface that helps meet your needs. For instance, Neptune Analytics simplifies vector search through API calls and stored procedures, making it an attractive choice if you prioritize a streamlined, user-friendly interface. For more details, refer to working with vector similarity in Neptune Analytics.

    Integrations

    Integrations with Knowledge Bases for Amazon Bedrock is important if you’re looking to automate the data ingestion and runtime orchestration workflows. Both Aurora PostgreSQL-Compatible and OpenSearch Service are integrated with Knowledge Bases for Amazon Bedrock, with more to come. Similarly, integration with Amazon SageMaker enables a seamless, scalable, and customizable solution for building applications that rely on vector similarity search, personalized recommendations, or other vector-based operations, while using the power of machine learning (ML) and the AWS environment. In addition to Aurora PostgreSQL and OpenSearch Service, Amazon DocumentDB and Neptune Analytics are integrated with SageMaker.

    Additionally, open source frameworks like LangChain and LlamaIndex can be helpful for building LLM applications because they provide a powerful set of tools, abstractions, and utilities that simplify the development process, improve productivity, and enable developers to focus on creating value-added features. LangChain seamlessly integrates with various AWS databases, storage systems, and external APIs, in addition to being integrated with Amazon Bedrock. This integration allows developers to easily use AWS databases and Bedrock’s models within the LangChain framework. LlamaIndex supports Neptune Analytics as a vector store and graph databases for building GraphRAG applications. Similarly, Hugging Face is a popular platform that provides a wide range of pre-trained models, including BERT, GPT, and T5. Hugging Face is integrated with AWS services, allowing you to deploy models on AWS infrastructure and use them with databases like OpenSearch Service, Aurora PostgreSQL-Compatible, Neptune Analytics, or MemoryDB.

    Scalability

    Scalability is a key factor when evaluating databases, enabling production applications to run efficiently without disruption. The scalability of databases for vector search is tied to their ability to support high-dimensional vectors and vast numbers of embeddings. Different databases have different means of scaling to support increased utilization, for example the scaling mechanisms and engineering of Aurora PostgreSQL will function differently than scaling on OpenSearch Service or MemoryDB. Understanding the scaling mechanisms of a database is essential to planning for continued growth of your applications. If we consider the example of a music company looking to build a rapidly growing music recommendation engine, OpenSearch Service makes it straightforward to operationalize the scalability of the recommendation engine by providing scale-out distributed infrastructure. Similarly, a global financial services company can use Amazon Aurora Global Database to build a scalable and resilient vector search solution for personalized investment recommendations. By deploying a primary database in one AWS Region and replicating it to multiple secondary Regions, the company can provide high availability, disaster recovery, and global application access with minimal latency, while delivering accurate and personalized recommendations to clients worldwide.

    AWS databases offer an extensive set of database engines with diverse scaling mechanisms to help meet the scaling demands of nearly any generative AI requirement.

    Performance

    Another critical consideration is database performance. As organizations strive to extract valuable insights from vast amounts of high-dimensional vector data, the ability to run complex vector searches and operations at scale becomes paramount. When evaluating the performance of databases with vector capabilities, it’s important to evaluate the following characteristics:

    Throughput – Number of queries processed per second
    Recall – Relevance and completeness of retrieved vectors, providing accurate responses
    Index build time – Duration required to build the vector index
    Scale/cost – Ability to efficiently scale to billions of vectors while remaining cost-effective
    p99 latency – Maximum latency for 99% of requests, meeting response time expectations
    Storage utilized – How efficiently storage of high-dimensional vectors is used, which is particularly important for high-dimensional vectors

    For example, a global bank building a real-time recommendation engine for their financial instruments needs to stay below their established end-to-end latency budget, while delivering highly relevant vector search results and experiences for users at single-digit millisecond latencies for tens of thousands of concurrent users. In this scenario, MemoryDB is the right choice. The choice of indexing technique also significantly impacts query performance. Approximate nearest neighbor (ANN) based techniques such as Hierarchical Navigable Small World (HNSW) and inverted file with flat compression (IVFFlat) trade off search performance, returning the most relevant results over other k-NN techniques. Understanding these indexing methods is essential for choosing the best database for your specific use case and performance requirements. For example, Aurora Optimized Reads with NVMe caching using HNSW indexing can provide up to a nine-times increase in average query throughput compared to instances without Optimized Reads.

    AWS offers databases that can help fulfill the vector search performance needs of your application. These databases provide various performance optimization techniques and advanced monitoring tools, allowing organizations to fine-tune their vector data management solutions, address performance bottlenecks, and achieve consistent performance at scale. By using AWS databases, businesses can unlock the full potential of their vector search requirements, enabling real-time insights, personalized experiences, and innovative AI and ML applications that drive growth and innovation.

    High-level service characteristics

    Choosing a fully managed database on AWS to run your vector workloads, supported by the pillars of the Well-Architected Framework, offers significant advantages. It combines the scalability, security, and reliability of AWS infrastructure with the operational excellence and best practices around management, allowing businesses to use well-established database technologies while benefiting from streamlined operations, reduced overhead, and the ability to scale seamlessly as their needs evolve. The following are the characteristics that we have observed as pivotal in the database evaluation process:

    Semantic search – Semantic search is an information retrieval technique that understands the meaning and context of search queries to deliver more relevant and accurate results. Semantic search is supported by many databases currently offered on AWS that support vector search capabilities.
    Serverless – This is the ability for a database to elastically scale to efficiently meet demands with little-to-no user management. Serverless is currently available for OpenSearch Service and Aurora PostgreSQL, and both are integrated with Knowledge Bases for Amazon Bedrock, which provide fully managed RAG workflows.
    Dimensionality – Many AWS customers employ open source or custom embedding models that span a diverse range of dimensions; some examples include models such as Cohere Embed English v3 on Amazon Bedrock with 1,024, or the Amazon Titan Text Embeddings V2 with a choice of 256, 512, or 1,024 dimensions. AWS databases that support vector search support these dimension sizes across all of these models and are continuously innovating to deliver new standards on scalability and functionality.
    Indexing – The most widely adopted indexing algorithm amongst our customer base is HNSW, and all AWS database services with vector search support HNSW. The IVFFlat indexing method is also supported amongst a subset of these database services.
    Billion-scale vector workloads – As your vector workloads grow to support enterprise applications throughout 2024 and into 2025, our database services are equipped to handle billion-scale vector workloads.
    Relevancy – To optimize your applications by use cases, you must also confirm the relevancy of the vector search results, as measured by recall. All AWS database services with vector search support configurable recall in some capacity.
    Hybrid search and pre-filtering – Many customers consider how to pre- and post-filter their vector search queries to focus on specific product categories, geographies, or other data subsections. AWS database services provide a layer of hybrid search or pre-filtering capabilities, with several like Aurora PostgreSQL, Amazon RDS for PostgreSQL, MemoryDB, and OpenSearch Service going a step beyond and offering full-text search and hybrid search capabilities.

    Conclusion

    Selecting the right database for your generative AI application is crucial for success. While factors like familiarity, ease of implementation, scalability, and performance are important, we recognize the need for more specific guidance in this rapidly evolving space. Based on our current understanding and available options within the AWS managed database portfolio, we recommend the following:

    If you’re already using OpenSearch Service, Aurora PostgreSQL, RDS for PostgreSQL, DocumentDB or MemoryDB, leverage their vector search capabilities for your existing data.
    For graph-based RAG applications, consider Amazon Neptune.
    If your data is stored in DynamoDB, OpenSearch can be an excellent choice for vector search using zero-ETL integration.
    If you are still unsure, use OpenSearch Service which is the default database engine for Amazon Bedrock.

    The generative AI landscape is dynamic and continues to evolve rapidly. We encourage testing different database services with your specific datasets and ML algorithms with consideration for how your data will grow over time so your solution can scale seamlessly with your workload.

    AWS offers a diverse range of database options, each with unique strengths. By leveraging AWS’s powerful ecosystem, organizations can empower their applications with efficient and scalable databases featuring vector storage and search capabilities, driving innovation and competitive advantage. AWS is committed to helping you navigate this journey. If you have further questions or need assistance in designing your optimal path forward for generative AI, don’t hesitate to contact our team of experts.

    About the Authors

    Shayon Sanyal is a Principal Database Specialist Solutions Architect and a Subject Matter Expert for Amazon’s flagship relational database, Amazon Aurora. He has over 15 years of experience managing relational databases and analytics workloads. Shayon’s relentless dedication to customer success allows him to help customers design scalable, secure and robust cloud native architectures. Shayon also helps service teams with design and delivery of pioneering features, such as Generative AI.

    Graham Kutchek is a Database Specialist Solutions Architect with expertise across all of Amazon’s database offerings. He is an industry specialist in media and entertainment, helping some of the largest media companies in the world run scalable, efficient, and reliable database deployments. Graham has a particular focus on graph databases, vector databases, and AI recommendation systems.

    Source: Read More

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleHow MoneyLion achieved price predictability and 55% cost-savings using Amazon Aurora I/O-Optimized and optimized RI purchases
    Next Article Secure Boot ‘PKfail’ Vulnerability Exposes Widespread Supply Chain Weakness

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 15, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-4589 – WordPress Bon Toolkit Stored Cross-Site Scripting Vulnerability

    May 15, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    GitHub for Beginners: Security best practices with GitHub Copilot

    News & Updates

    How to do a clean install of Windows 11: See which option is best for you and why

    News & Updates

    CVE-2025-3520 – “WordPress Avatar Plugin File Deletion Vulnerability”

    Common Vulnerabilities and Exposures (CVEs)

    cfonts – command line tool for sexy ANSI fonts in the console

    Linux

    Highlights

    Linux

    Rilasciato Mozilla Firefox 138: tutte le novità del browser open-source

    April 29, 2025

    Firefox, il celebre browser web open source sviluppato da Mozilla, è stato aggiornato ed è…

    Stanford Researchers Introduce BIOMEDICA: A Scalable AI Framework for Advancing Biomedical Vision-Language Models with Large-Scale Multimodal Datasets

    January 19, 2025

    How to Fix Microsoft Word Transcribe if it’s Not Working

    January 29, 2025

    OpenAI and Microsoft ironically accuse DeepSeek of copyright infringement — training its cost-effective model with privileged data

    January 29, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.