Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 15, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 15, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 15, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 15, 2025

      Intel’s latest Arc graphics driver is ready for DOOM: The Dark Ages, launching for Premium Edition owners on PC today

      May 15, 2025

      NVIDIA’s drivers are causing big problems for DOOM: The Dark Ages, but some fixes are available

      May 15, 2025

      Capcom breaks all-time profit records with 10% income growth after Monster Hunter Wilds sold over 10 million copies in a month

      May 15, 2025

      Microsoft plans to lay off 3% of its workforce, reportedly targeting management cuts as it changes to fit a “dynamic marketplace”

      May 15, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      A cross-platform Markdown note-taking application

      May 15, 2025
      Recent

      A cross-platform Markdown note-taking application

      May 15, 2025

      AI Assistant Demo & Tips for Enterprise Projects

      May 15, 2025

      Celebrating Global Accessibility Awareness Day (GAAD)

      May 15, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Intel’s latest Arc graphics driver is ready for DOOM: The Dark Ages, launching for Premium Edition owners on PC today

      May 15, 2025
      Recent

      Intel’s latest Arc graphics driver is ready for DOOM: The Dark Ages, launching for Premium Edition owners on PC today

      May 15, 2025

      NVIDIA’s drivers are causing big problems for DOOM: The Dark Ages, but some fixes are available

      May 15, 2025

      Capcom breaks all-time profit records with 10% income growth after Monster Hunter Wilds sold over 10 million copies in a month

      May 15, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»RAG architecture with Voyage AI embedding models on Amazon SageMaker JumpStart and Anthropic Claude 3 models

    RAG architecture with Voyage AI embedding models on Amazon SageMaker JumpStart and Anthropic Claude 3 models

    May 14, 2024

    This post is a guest post co-written with Tengyu Ma and Wen Phan from Voyage AI.

    Organizations today have access to vast amounts of data, much of it proprietary, which holds the potential to unlock valuable insights when used effectively in generative artificial intelligence (AI) applications. Retrieval Augmented Generation (RAG) is a powerful technique designed to tap into this reservoir of information. By dynamically pulling relevant data from these extensive databases during the response generation process, RAG enables AI models to produce more accurate, relevant, and contextually rich outputs.

    Embedding models are crucial components in the RAG architecture, serving as the foundation for effectively identifying and retrieving the most relevant information from a large dataset. These models convert large volumes of text into compact, numerical representations, allowing the system to quickly sift through and match query-related data with unprecedented precision. By facilitating a more efficient and accurate retrieval process, embedding models make sure that the generative component of RAG is fed with the most pertinent information.

    In this post, we provide an overview of the state-of-the-art embedding models by Voyage AI and show a RAG implementation with Voyage AI’s text embedding model on Amazon SageMaker Jumpstart, Anthropic’s Claude 3 model on Amazon Bedrock, and Amazon OpenSearch Service. Voyage AI’s embedding models are the preferred embedding models for Anthropic. In addition to general-purpose embedding models, Voyage AI offers domain-specific embedding models that are tuned to a particular domain.

    RAG architecture and embedding models

    RAG is the predominant design pattern for enterprise chatbots where a retrieval system fetches validated sources and documents that are pertinent to the query and inputs them to a large language model (LLM) to generate a response. It combines the generative capabilities of models with the informational breadth found in vast databases, enabling the model to pull relevant external documents to enhance its responses. This results in outputs that are not only contextually rich but also factually accurate, significantly boosting the reliability and utility of LLMs across diverse applications.

    Let’s briefly review RAG using the following figure.

    RAG systems are empowered by semantic search using dense-vector representations of the documents called embeddings. These vectors are stored in a vector store, where they can be efficiently retrieved later. At query time, a query is also converted into a vector and then used to find and retrieve similar documents stored in the vector store via a k-nearest neighbor (k-NN) search against the document vector representations. Finally, the retrieved documents along with the query are used to prompt the generative model, often resulting in higher-quality responses and fewer hallucinations.

    Embedding models are neural network models that transform queries and documents into embeddings. The retrieval quality is solely decided by how the data is represented as vectors, and the effectiveness of embedding models is evaluated based on their accuracy in retrieving relevant information. Therefore, the retrieval quality of the embedding models is highly correlated with the quality of the RAG system responses—to make your RAG more successful, you should consider improving your embeddings. Check out this blog for a detailed explanation.

    Voyage AI’s general-purpose and domain-specific embedding models

    Voyage AI develops cutting-edge embedding models with state-of-the-art retrieval accuracy. voyage-large-2 is Voyage’s most powerful generalist embedding model, outperforming popular competing models. Voyage also offers voyage-2, a base generalist embedding model optimized for latency and quality. The following table summarizes the Voyage embedding models currently available on SageMaker JumpStart.

    Voyage AI Model
    SageMaker JumpStart Model ID
    Description

    voyage-2
    voyage-2-embedding
    General-purpose embedding model optimized for a balance between cost, latency, and retrieval quality

    voyage-large-2
    voyage-large-2-embedding
    General-purpose embedding model optimized for retrieval quality

    voyage-code-2
    voyage-code-2-embedding
    Domain-specific embedding model optimized for code retrieval (17% better than alternatives)

    In addition to general-purpose embedding models, Voyage AI offers domain-specific ones that are tuned to a particular domain. These domain-specific embedding models are trained on massive domain-specific datasets, allowing them to deeply understand and excel in that domain. For example, Voyage’s code embedding model (voyage-code-2) outperforms general-purpose embedding models on code-related data documents, achieving about a 15% improvement over the next best model. This performance gap over the next best general-purpose embedding improves even more for datasets requiring deeper code understanding. See voyage-code-2: Elevate Your Code Retrieval for voyage-code-2 details. More recently, Voyage released a legal embedding model (voyage-law-2) that is optimized for legal retrieval and tops the MTEB leaderboard for legal retrieval. See Domain-Specific Embeddings and Retrieval: Legal Edition (voyage-law-2) for voyage-law-2 details. Voyage AI plans to continue releasing additional domain-specific embedding models in the near future, including finance, healthcare, and multi-language. For a list of all available Voyage AI embedding models, see Embeddings.

    Voyage AI offers API endpoints for embedding models, making it seamless to integrate with other components of your RAG stack. The Voyage AI embedding models are available on AWS Marketplace and deployable as Amazon SageMaker endpoints within your account and VPC, eliminating security and compliance concerns. As part of SageMaker JumpStart, you can deploy Voyage AI embedding models with a few clicks and start running your RAG stack on AWS.

    Solution overview

    In this RAG solution, we use Voyage AI embedding models deployed with SageMaker JumpStart to demonstrate an example using the Apple 2022 annual report (SEC Form 10-K) as the corpus to retrieve from. Specifically, we deploy the SageMaker model package of the voyage-large-2 model. For the LLM, we use the Anthropic Claude 3 Sonnet model on Amazon Bedrock. We use OpenSearch Service as the vector store. You can also follow along with the notebook. The following diagram illustrates the solution architecture.

    SageMaker JumpStart is the machine learning (ML) hub of SageMaker that offers one-click access to over 350 open source and third-party models. These models can be discovered and deployed through the Amazon SageMaker Studio UI or using the SageMaker Python SDK. SageMaker JumpStart provides notebooks to customize and deploy foundation models into your VPC.

    Anthropic’s Claude 3 models are the next generation of state-of-the-art models from Anthropic. For the vast majority of workloads, Sonnet is faster on inputs and outputs than Anthropic’s Claude 2 and 2.1 models, with higher levels of intelligence. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like Anthropic through an API, making it straightforward to build generative AI applications. To follow along, be sure to request model access to Anthropic Claude 3 Sonnet on Amazon Bedrock.

    Amazon OpenSearch Service is a managed service that makes it straightforward to deploy, operate, and scale OpenSearch, a popular open source, distributed search analytics suite derived from Elasticsearch. OpenSearch provides the ability to do vector search via the k-NN search.

    Prerequisites

    To follow along, you need to create an OpenSearch Service domain. For the purposes of this walkthrough, the Easy create option is fine. Keep the Enable fine-grained access control option selected. Select Create master user and provide a user name and password. After the domain has been created, the domain details will have the domain endpoint, which you’ll need—along with the user name and password—to access your OpenSearch instance. You don’t need to worry about creating an index or inserting data. We use the OpenSearch Python client to work with our vector store in the walkthrough.

    Deploy Embedding model endpoint

    To use voyage-large-2, you need to subscribe to the SageMaker model package in AWS Marketplace. For instructions, see Subscribe to the model package. Choosing the model card in the SageMaker JumpStart UI will also bring you to the model listing page on AWS Marketplace.

    After you’re subscribed, you can initialize and deploy the embedding model as a SageMaker endpoint as follows:

    # Set embedding endpoint configuration
    (embedding_model_id, embedding_model_version, embedding_instance_type) = (
    “voyage-large-2-embedding”,
    “*”,
    “ml.g5.xlarge”, # See AWS Marketplace model package for supported instance types
    )

    # Instantiate embedding model from JumpStart
    from sagemaker.jumpstart.model import JumpStartModel

    embedding_model = JumpStartModel(
    model_id=embedding_model_id,
    model_version=embedding_model_version,
    instance_type=embedding_instance_type,
    )

    # Deploy model as inference endpoint. This can take several minutes to deploy (5 to 10 minutes)
    embedding_endpoint = embedding_model.deploy()

    Vectorize Documents

    With the embedding endpoint deployed, you can index your documents for retrieval.

    Transform and chunk documents

    You need a list of strings to invoke the deployed voyage-large-2 model. For many documents, like our example annual report, each string is a semantically meaningful chunk of text. There are several ways you can load and chunk documents for vectorization. The code in this section is just one example; feel free to use what suits your data source and files.

    In this walkthrough, we load and chunk the source PDF file with the LangChain PyPDFLoader (which uses pypdf) and recursive character text splitter:

    from langchain_community.document_loaders import PyPDFLoader
    from langchain_text_splitters import RecursiveCharacterTextSplitter

    loader = PyPDFLoader(“apple-10k-2022.pdf”)
    document_chunks = loader.load_and_split(
    RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=100,
    length_function=len,
    is_separator_regex=False,
    )
    )

    In practice, selecting the text splitting chunk size and overlap requires some experimentation. The are many techniques for appropriately chunking documents for high-quality retrieval, but that is beyond the scope of this post.

    Generate document embeddings

    You can now vectorize your documents—or more precisely, your document chunks. See the following code:

    # Set batch size
    BATCH_SIZE = 45
    In [ ]:
    # Vectorize chunks in batches
    index_list = []
    for i in range(0, len(chunk_list), BATCH_SIZE):
    docs_playload = {
    “input”: chunk_list[i:i + BATCH_SIZE],
    “input_type”: “document”,
    “truncation”: “true”,
    }

    embed_docs_response = embedding_endpoint.predict(json.dumps(docs_playload))

    doc_embeddings_list = [d[“embedding”] for d in embed_docs_response[“data”]]
    index_list += [
    {“document”: document, “embedding”: embedding}
    for document, embedding in zip(chunk_list[i:i + BATCH_SIZE], doc_embeddings_list)
    ]

    Create a vector store index

    The next step is to populate your OpenSearch vector search index with the document embeddings using the OpenSearch Python client:

    # Populate index with document, embedding, and ID
    for id, i in zip(range(0, len(index_list)), index_list):
    index_response = opensearch_client.index(
    index=INDEX_NAME_OPENSEARCH,
    body={
    “document”: i[“document”],
    “embedding”: i[“embedding”],
    },
    id=id,
    refresh=True,
    )

    Retrieve relevant documents

    With your indexed vector store, you can now use embeddings to find relevant documents to your query:

    # Set number of documents to retrieve
    TOP_K = 3
    In [ ]:
    # Set vector search payload
    vector_search_payload = {
    “size”: TOP_K,
    “query”: {“knn”: {“embedding”: {“vector”: query_embedding, “k”: TOP_K}}},
    }
    In [ ]:
    vector_search_response = opensearch_client.search(
    index=INDEX_NAME_OPENSEARCH,
    body=vector_search_payload,
    )

    The following is a formatted semantic search result of the top three most-relevant document chunks, indicating the index ID, similarity score, and the first several characters of the chunk:

    ID: 4
    Score: 0.7956404
    Document: under Section 404(b) of the Sarbanes-Oxley Act (15 U.S.C. 7262(b)) by the registered public accounting firm that prepared or issued its audit report. ☒
    Indicate by check mark whether the Registrant is a shell company (as defined in Rule 12b-2 of the Act).
    Yes ☐ No ☒
    The aggregate market value of the voting and non-voting stock held by non-affiliates of the Registrant, as of March 25, 2022, the last business day of the Registrant’s most recently completed second fiscal quarter, was approximately $2,830,067,000,000. Solely for purposes of this disclosure, shares of common stock held by executive officers and directors of the Registrant as of such date have been excluded because such persons may be deemed to be affiliates. This determination of executive officers and directors as affiliates is not necessarily a conclusive determination for any other purposes. 15,908,118,000 shares of common stock were issued and outstanding as of October 14, 2022.

    ID: 5
    Score: 0.7367379
    Document: 15,908,118,000 shares of common stock were issued and outstanding as of October 14, 2022.
    DOCUMENTS INCORPORATED BY REFERENCE
    Portions of the Registrant’s definitive proxy statement relating to its 2023 annual meeting of shareholders are incorporated by reference into Part III of this Annual Report on Form 10-K where indicated. The Registrant’s definitive proxy statement will be filed with the U.S. Securities and Exchange Commission within 120 days after the end of the fiscal year to which this report relates.

    ID: 178
    Score: 0.7263324
    Document: Note 3 – Financial Instruments
    Cash, Cash Equivalents and Marketable Securities
    The following tables show the Company’ s cash, cash equivalents and marketable securities by significant investment category as of September 24, 2022 and September 25, 2021 (in millions):
    2022
    Adjusted Cost
    Unrealized Gains
    Unrealized Losses
    Fair Value
    Cash and Cash Equivalents
    Current Marketable Securities
    Non-Current Marketable Securities
    Cash $ 18,546 $ — $ — $ 18,546 $ 18,546 $ — $ —
    Level 1 :
    Money market funds 2,929 — — 2,929 2,929 — —
    Mutual funds 274 — (47) 227 — 227 —
    Subtotal 3,203 — (47) 3,156 2,929 227 —
    Level 2 :
    U.S. Treasury securities 25,134 — (1,725) 23,409 338 5,091 17,980
    U.S. agency securities 5,823 — (655) 5,168 — 240 4,928
    Non-U.S. government securities 16,948 2 (1,201) 15,749 — 8,806 6,943 Certificates of deposit and time deposits 2,067 — — 2,067 1,805 262 —
    Commercial paper 718 — — 718 28 690 —
    Corporate debt securities 87,148 9 (7,707) 79,450 — 9,023 70,427

    The top retrieved document chunk (ID 4 with a score of 0.7956404) contains a statement that provides a direct answer to our query:

    The aggregate market value of the voting and non-voting stock held by non-affiliates of the Registrant, as of March 25, 2022, the last business day of the Registrant’s most recently completed second fiscal quarter, was approximately $2,830,067,000,000.

    This additional context will enable Claude to provide a response that answers your query.

    Generate a retrieval augmented response

    You can now prompt Claude to use the retrieved documents to answer your query:

    # Create retrieval-augmented prompt
    rag_prompt = f”””Human:

    INSTRUCTIONS:
    Answer the QUERY using the CONTEXT text provided below. Keep your answer
    grounded in the facts of the CONTEXT. If the CONTEXT doesn’t contain the
    facts to answer the QUERY just respond with “I do not have enough context
    to respond to this query.”.

    QUERY: {query}

    CONTEXT: {context}

    Assistant:
    “””

    Next initialize the Amazon Bedrock client to invoke Anthropic’s Claude3 Sonnet model in us-east-1.

    # List available LLMs on Amazon Bedrock
    bedrock_client = boto3.client(‘bedrock’, region_name=’us-east-1′)
    bedrock_fm = bedrock_client.list_foundation_models()
    print([(m[“modelId”], m[“modelName”]) for m in bedrock_fm[“modelSummaries”]])
    In [ ]:
    # Set Bedrock LLM model ID
    LLM_MODEL_ID = “anthropic.claude-3-sonnet-20240229-v1:0”
    In [ ]:
    bedrock_runtime_client = boto3.client(‘bedrock-runtime’, region_name=’us-east-1′)
    In [ ]:
    # Invoke LLM inference
    try:
    llm_rag_response = bedrock_runtime_client.invoke_model(
    modelId=LLM_MODEL_ID,
    body=json.dumps(rag_payload),
    accept=”application/json”,
    contentType=”application/json”,
    )
    except botocore.exceptions.ClientError as error:
    print(“Client error”)
    raise error

    Claude produces a correct response that answers the query using the context provided:

    According to the context provided, “The aggregate market value of the voting and non-voting stock held by non-affiliates of the Registrant, as of March 25, 2022, the last business day of the Registrant’s most recently completed second fiscal quarter, was approximately $2,830,067,000,000.”

    Generate a response without retrieved documents

    For comparison, let’s see how Claude will respond to a query directly—without any additional context. Specifically, we ask “What is the aggregate market value of the voting and non-voting stock held by non-affiliates of the Registrant, as of March 25, 2022?”

    The following is Claude’s response:

    Unfortunately, without access to the specific financial filings or reports for the company you are asking about, I do not have enough information to determine the aggregate market value of the voting and non-voting stock held by non-affiliates as of March 25, 2022.

    Public companies are required to disclose this information in their annual reports and other SEC filings, but I do not have access to those documents for the particular company or “Registrant” you are referring to. The aggregate market value calculation takes into account things like the number of outstanding shares, recent stock prices, and which shareholders are considered affiliates versus non-affiliates.

    Unless you are able to provide the company’s name and the specific report containing this data point, I cannot lookup or calculate that aggregate market value figure. My role is to provide helpful information to you, but I do not have a way to access each public company’s proprietary financial data directly. Please let me know if you can share more details about the company in question.

    Claude indicates it does not have enough context, especially because the query didn’t explicitly indicate the registrant (the company).

    Clean up

    If you’ve been following along, don’t forget to delete the SageMaker endpoint and delete the OpenSearch Service domain when you’re done so you don’t incur unnecessary costs:

    embedding_endpoint.delete_model()
    embedding_endpoint.delete_endpoint()

    Conclusion

    Embeddings are at the heart of a RAG system, and Voyage AI offers the best general-purpose and domain-specific embedding models today. Get started or level up your existing RAG stack on AWS today with Voyage AI embedding models—it’s seamless with SageMaker JumpStart. You can try the notebook in this post yourself. Learn more about Voyage AI and follow them on X (Twitter) or LinkedIn for updates!

    About the Authors

    Tengyu Ma is CEO and Co-Founder of Voyage AI and an assistant professor of computer science at Stanford University. His research interests broadly include topics in machine learning, algorithms and their theory, such as deep learning, (deep) reinforcement learning, pre-training / foundation models, robustness, non-convex optimization, distributed optimization, and high-dimensional statistics. Tengyu earned his PhD from Princeton University and has worked at Facebook and Google as visiting scientists.

    Wen Phan is Head of Product at Voyage AI and has spent the last decade developing and commercializing AI and data products for enterprises. He has worked with hundreds of users and organizations around the world to apply AI and data to their use cases in financial services, healthcare, defense, and technology, to name a few. Wen holds a B.S. in electrical engineering and M.S. in analytics and decision sciences. Personally, he enjoys spinning hip-hop records, dining out, and spending time with his wife and two kids — oh, and guzzling cookies and cream milkshakes, too!

    Vivek Gangasani is an AI/ML Solutions Architect working with Generative AI startups on AWS. He helps world leading AI startups train, host and operationalize LLMs to build innovative Generative AI solutions. Currently, he is focused on developing strategies for fine-tuning and optimizing the inference performance at scale for LLMs. In his free time, Vivek enjoys hiking, watching movies and trying different cuisines.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleWeb-Instruct’s Instruction Tuning for MAmmoTH2 and MAmmoTH2-Plus Models: The Power of Web-Mined Data in Enhancing Large Language Models
    Next Article Incorporate offline and online human – machine workflows into your generative AI applications on AWS

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 16, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-4732 – TOTOLINK A3002R/A3002RU HTTP POST Request Handler Buffer Overflow

    May 16, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Secure Service-to-Service Communication with Okta

    Development

    SteelFox and Rhadamanthys Malware Use Copyright Scams, Driver Exploits to Target Victims

    Development

    How to assert custom XML response does not return any unwanted tags in SOAPUI?

    Development

    Cisco waarschuwt voor kritiek Erlang/OTP SSH-lek in eigen producten

    Security
    Hostinger

    Highlights

    Development

    The End of an Era: Microsoft Phases Out VBScript for JavaScript and PowerShell

    May 23, 2024

    Microsoft on Wednesday outlined its plans to deprecate Visual Basic Script (VBScript) in the second…

    The latest KB5049981 Patch Tuesday for Windows 10 causes SgrmBroker.exe to be terminated and users are quite worried about it

    January 27, 2025

    15 Best Free and Open Source File Synchronization Tools

    December 8, 2024

    From Humble Beginnings to Global Success: Inspiring Transformations of Famous Brands

    May 20, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.