Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 18, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 18, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 18, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 18, 2025

      I need to see more from Lenovo’s most affordable gaming desktop, because this isn’t good enough

      May 18, 2025

      Gears of War: Reloaded — Release date, price, and everything you need to know

      May 18, 2025

      I’ve been using the Logitech MX Master 3S’ gaming-influenced alternative, and it could be your next mouse

      May 18, 2025

      Your Android devices are getting several upgrades for free – including a big one for Auto

      May 18, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      YTConverter™ lets you download YouTube videos/audio cleanly via terminal — especially great for Termux users.

      May 18, 2025
      Recent

      YTConverter™ lets you download YouTube videos/audio cleanly via terminal — especially great for Termux users.

      May 18, 2025

      NodeSource N|Solid Runtime Release – May 2025: Performance, Stability & the Final Update for v18

      May 17, 2025

      Big Changes at Meteor Software: Our Next Chapter

      May 17, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      I need to see more from Lenovo’s most affordable gaming desktop, because this isn’t good enough

      May 18, 2025
      Recent

      I need to see more from Lenovo’s most affordable gaming desktop, because this isn’t good enough

      May 18, 2025

      Gears of War: Reloaded — Release date, price, and everything you need to know

      May 18, 2025

      I’ve been using the Logitech MX Master 3S’ gaming-influenced alternative, and it could be your next mouse

      May 18, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»How to Build a Powerful and Intelligent Question-Answering System by Using Tavily Search API, Chroma, Google Gemini LLMs, and the LangChain Framework

    How to Build a Powerful and Intelligent Question-Answering System by Using Tavily Search API, Chroma, Google Gemini LLMs, and the LangChain Framework

    May 18, 2025

    In this tutorial, we demonstrate how to build a powerful and intelligent question-answering system by combining the strengths of Tavily Search API, Chroma, Google Gemini LLMs, and the LangChain framework. The pipeline leverages real-time web search using Tavily, semantic document caching with Chroma vector store, and contextual response generation through the Gemini model. These tools are integrated through LangChain’s modular components, such as RunnableLambda, ChatPromptTemplate, ConversationBufferMemory, and GoogleGenerativeAIEmbeddings. It goes beyond simple Q&A by introducing a hybrid retrieval mechanism that checks for cached embeddings before invoking fresh web searches. The retrieved documents are intelligently formatted, summarized, and passed through a structured LLM prompt, with attention to source attribution, user history, and confidence scoring. Key functions such as advanced prompt engineering, sentiment and entity analysis, and dynamic vector store updates make this pipeline suitable for advanced use cases like research assistance, domain-specific summarization, and intelligent agents.

    Copy CodeCopiedUse a different Browser
    !pip install -qU langchain-community tavily-python langchain-google-genai streamlit matplotlib pandas tiktoken chromadb langchain_core pydantic langchain

    We install and upgrade a comprehensive set of libraries required to build an advanced AI search assistant. It includes tools for retrieval (tavily-python, chromadb), LLM integration (langchain-google-genai, langchain), data handling (pandas, pydantic), visualization (matplotlib, streamlit), and tokenization (tiktoken). These components form the core foundation for constructing a real-time, context-aware QA system.

    Copy CodeCopiedUse a different Browser
    import os
    import getpass
    import pandas as pd
    import matplotlib.pyplot as plt
    import numpy as np
    import json
    import time
    from typing import List, Dict, Any, Optional
    from datetime import datetime

    We import essential Python libraries used throughout the notebook. It includes standard libraries for environment variables, secure input, time tracking, and data types (os, getpass, time, typing, datetime). Additionally, it brings in core data science tools like pandas, matplotlib, and numpy for data handling, visualization, and numerical computations, as well as json for parsing structured data.

    Copy CodeCopiedUse a different Browser
    if "TAVILY_API_KEY" not in os.environ:
        os.environ["TAVILY_API_KEY"] = getpass.getpass("Enter Tavily API key: ")
       
    if "GOOGLE_API_KEY" not in os.environ:
        os.environ["GOOGLE_API_KEY"] = getpass.getpass("Enter Google API key: ")
    
    
    import logging
    logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
    logger = logging.getLogger(__name__)

    We securely initialize API keys for Tavily and Google Gemini by prompting users only if they’re not already set in the environment, ensuring safe and repeatable access to external services. It also configures a standardized logging setup using Python’s logging module, which helps monitor execution flow and capture debug or error messages throughout the notebook.

    Copy CodeCopiedUse a different Browser
    from langchain_community.retrievers import TavilySearchAPIRetriever
    from langchain_community.vectorstores import Chroma
    from langchain_core.documents import Document
    from langchain_core.output_parsers import StrOutputParser, JsonOutputParser
    from langchain_core.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate
    from langchain_core.runnables import RunnablePassthrough, RunnableLambda
    from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings
    from langchain.text_splitter import RecursiveCharacterTextSplitter
    from langchain.chains.summarize import load_summarize_chain
    from langchain.memory import ConversationBufferMemory

    We import key components from the LangChain ecosystem and its integrations. It brings in the TavilySearchAPIRetriever for real-time web search, Chroma for vector storage, and GoogleGenerativeAI modules for chat and embedding models. Core LangChain modules like ChatPromptTemplate, RunnableLambda, ConversationBufferMemory, and output parsers enable flexible prompt construction, memory handling, and pipeline execution.

    Copy CodeCopiedUse a different Browser
    class SearchQueryError(Exception):
        """Exception raised for errors in the search query."""
        pass
    
    
    def format_docs(docs):
        formatted_content = []
        for i, doc in enumerate(docs):
            metadata = doc.metadata
            source = metadata.get('source', 'Unknown source')
            title = metadata.get('title', 'Untitled')
            score = metadata.get('score', 0)
           
            formatted_content.append(
                f"Document {i+1} [Score: {score:.2f}]:n"
                f"Title: {title}n"
                f"Source: {source}n"
                f"Content: {doc.page_content}n"
            )
       
        return "nn".join(formatted_content)

    We define two essential components for search and document handling. The SearchQueryError class creates a custom exception to manage invalid or failed search queries gracefully. The format_docs function processes a list of retrieved documents by extracting metadata such as title, source, and relevance score and formatting them into a clean, readable string.

    Copy CodeCopiedUse a different Browser
    class SearchResultsParser:
        def parse(self, text):
            try:
                if isinstance(text, str):
                    import re
                    import json
                    json_match = re.search(r'{.*}', text, re.DOTALL)
                    if json_match:
                        json_str = json_match.group(0)
                        return json.loads(json_str)
                    return {"answer": text, "sources": [], "confidence": 0.5}
                elif hasattr(text, 'content'):
                    return {"answer": text.content, "sources": [], "confidence": 0.5}
                else:
                    return {"answer": str(text), "sources": [], "confidence": 0.5}
            except Exception as e:
                logger.warning(f"Failed to parse JSON: {e}")
                return {"answer": str(text), "sources": [], "confidence": 0.5}

    The SearchResultsParser class provides a robust method for extracting structured information from LLM responses. It attempts to parse a JSON-like string from the model output, returning to a plain text response format if parsing fails. It gracefully handles string outputs and message objects, ensuring consistent downstream processing. In case of errors, it logs a warning and returns a fallback response containing the raw answer, empty sources, and a default confidence score, enhancing the system’s fault tolerance.

    Copy CodeCopiedUse a different Browser
    class EnhancedTavilyRetriever:
        def __init__(self, api_key=None, max_results=5, search_depth="advanced", include_domains=None, exclude_domains=None):
            self.api_key = api_key
            self.max_results = max_results
            self.search_depth = search_depth
            self.include_domains = include_domains or []
            self.exclude_domains = exclude_domains or []
            self.retriever = self._create_retriever()
            self.previous_searches = []
           
        def _create_retriever(self):
            try:
                return TavilySearchAPIRetriever(
                    api_key=self.api_key,
                    k=self.max_results,
                    search_depth=self.search_depth,
                    include_domains=self.include_domains,
                    exclude_domains=self.exclude_domains
                )
            except Exception as e:
                logger.error(f"Failed to create Tavily retriever: {e}")
                raise
       
        def invoke(self, query, **kwargs):
            if not query or not query.strip():
                raise SearchQueryError("Empty search query")
           
            try:
                start_time = time.time()
                results = self.retriever.invoke(query, **kwargs)
                end_time = time.time()
               
                search_record = {
                    "timestamp": datetime.now().isoformat(),
                    "query": query,
                    "num_results": len(results),
                    "response_time": end_time - start_time
                }
                self.previous_searches.append(search_record)
               
                return results
            except Exception as e:
                logger.error(f"Search failed: {e}")
                raise SearchQueryError(f"Failed to perform search: {str(e)}")
       
        def get_search_history(self):
            return self.previous_searches

    The EnhancedTavilyRetriever class is a custom wrapper around the TavilySearchAPIRetriever, adding greater flexibility, control, and traceability to search operations. It supports advanced features like limiting search depth, domain inclusion/exclusion filters, and configurable result counts. The invoke method performs web searches and tracks each query’s metadata (timestamp, response time, and result count), storing it for later analysis.

    Copy CodeCopiedUse a different Browser
    class SearchCache:
        def __init__(self):
            self.embedding_function = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
            self.vector_store = None
            self.text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
           
        def add_documents(self, documents):
            if not documents:
                return
           
            try:
                if self.vector_store is None:
                    self.vector_store = Chroma.from_documents(
                        documents=documents,
                        embedding=self.embedding_function
                    )
                else:
                    self.vector_store.add_documents(documents)
            except Exception as e:
                logger.error(f"Failed to add documents to cache: {e}")
       
        def search(self, query, k=3):
            if self.vector_store is None:
                return []
           
            try:
                return self.vector_store.similarity_search(query, k=k)
            except Exception as e:
                logger.error(f"Vector search failed: {e}")
                return []

    The SearchCache class implements a semantic caching layer that stores and retrieves documents using vector embeddings for efficient similarity search. It uses GoogleGenerativeAIEmbeddings to convert documents into dense vectors and stores them in a Chroma vector database. The add_documents method initializes or updates the vector store, while the search method enables fast retrieval of the most relevant cached documents based on semantic similarity. This reduces redundant API calls and improves response times for repeated or related queries, serving as a lightweight hybrid memory layer in the AI assistant pipeline.

    Copy CodeCopiedUse a different Browser
    search_cache = SearchCache()
    enhanced_retriever = EnhancedTavilyRetriever(max_results=5)
    memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
    
    
    system_template = """You are a research assistant that provides accurate answers based on the search results provided.
    Follow these guidelines:
    1. Only use the context provided to answer the question
    2. If the context doesn't contain the answer, say "I don't have sufficient information to answer this question."
    3. Cite your sources by referencing the document numbers
    4. Don't make up information
    5. Keep the answer concise but complete
    
    
    Context: {context}
    Chat History: {chat_history}
    """
    
    
    system_message = SystemMessagePromptTemplate.from_template(system_template)
    human_template = "Question: {question}"
    human_message = HumanMessagePromptTemplate.from_template(human_template)
    
    
    prompt = ChatPromptTemplate.from_messages([system_message, human_message])
    

    We initialize the core components of the AI assistant: a semantic SearchCache, the EnhancedTavilyRetriever for web-based querying, and a ConversationBufferMemory to retain chat history across turns. It also defines a structured prompt using ChatPromptTemplate, guiding the LLM to act as a research assistant. The prompt enforces strict rules for factual accuracy, context usage, source citation, and concise answering, ensuring reliable and grounded responses.

    Copy CodeCopiedUse a different Browser
    def get_llm(model_name="gemini-2.0-flash-lite", temperature=0.2, response_mode="json"):
        try:
            return ChatGoogleGenerativeAI(
                model=model_name,
                temperature=temperature,
                convert_system_message_to_human=True,
                top_p=0.95,
                top_k=40,
                max_output_tokens=2048
            )
        except Exception as e:
            logger.error(f"Failed to initialize LLM: {e}")
            raise
    
    
    output_parser = SearchResultsParser()
    

    We define the get_llm function, which initializes a Google Gemini language model with configurable parameters such as model name, temperature, and decoding settings (e.g., top_p, top_k, and max tokens). It ensures robustness with error handling for failed model initialization. An instance of SearchResultsParser is also created to standardize and structure the LLM’s raw responses, enabling consistent downstream processing of answers and metadata.

    Copy CodeCopiedUse a different Browser
    def plot_search_metrics(search_history):
        if not search_history:
            print("No search history available")
            return
       
        df = pd.DataFrame(search_history)
       
        plt.figure(figsize=(12, 6))
        plt.subplot(1, 2, 1)
        plt.plot(range(len(df)), df['response_time'], marker='o')
        plt.title('Search Response Times')
        plt.xlabel('Search Index')
        plt.ylabel('Time (seconds)')
        plt.grid(True)
       
        plt.subplot(1, 2, 2)
        plt.bar(range(len(df)), df['num_results'])
        plt.title('Number of Results per Search')
        plt.xlabel('Search Index')
        plt.ylabel('Number of Results')
        plt.grid(True)
       
        plt.tight_layout()
        plt.show()
    

    The plot_search_metrics function visualizes performance trends from past queries using Matplotlib. It converts the search history into a DataFrame and plots two subgraphs: one showing response time per search and the other displaying the number of results returned. This aids in analyzing the system’s efficiency and search quality over time, helping developers fine-tune the retriever or identify bottlenecks in real-world usage.

    Copy CodeCopiedUse a different Browser
    def retrieve_with_fallback(query):
        cached_results = search_cache.search(query)
       
        if cached_results:
            logger.info(f"Retrieved {len(cached_results)} documents from cache")
            return cached_results
       
        logger.info("No cache hit, performing web search")
        search_results = enhanced_retriever.invoke(query)
       
        search_cache.add_documents(search_results)
       
        return search_results
    
    
    def summarize_documents(documents, query):
        llm = get_llm(temperature=0)
       
        summarize_prompt = ChatPromptTemplate.from_template(
            """Create a concise summary of the following documents related to this query: {query}
           
            {documents}
           
            Provide a comprehensive summary that addresses the key points relevant to the query.
            """
        )
       
        chain = (
            {"documents": lambda docs: format_docs(docs), "query": lambda _: query}
            | summarize_prompt
            | llm
            | StrOutputParser()
        )
       
        return chain.invoke(documents)

    These two functions enhance the assistant’s intelligence and efficiency. The retrieve_with_fallback function implements a hybrid retrieval mechanism: it first attempts to fetch semantically relevant documents from the local Chroma cache and, if unsuccessful, falls back to a real-time Tavily web search, caching the new results for future use. Meanwhile, summarize_documents leverages a Gemini LLM to generate concise summaries from retrieved documents, guided by a structured prompt that ensures relevance to the query. Together, they enable low-latency, informative, and context-aware responses.

    Copy CodeCopiedUse a different Browser
    def advanced_chain(query_engine="enhanced", model="gemini-1.5-pro", include_history=True):
        llm = get_llm(model_name=model)
       
        if query_engine == "enhanced":
            retriever = lambda query: retrieve_with_fallback(query)
        else:
            retriever = enhanced_retriever.invoke
       
        def chain_with_history(input_dict):
            query = input_dict["question"]
            chat_history = memory.load_memory_variables({})["chat_history"] if include_history else []
           
            docs = retriever(query)
           
            context = format_docs(docs)
           
            result = prompt.invoke({
                "context": context,
                "question": query,
                "chat_history": chat_history
            })
           
            memory.save_context({"input": query}, {"output": result.content})
           
            return llm.invoke(result)
       
        return RunnableLambda(chain_with_history) | StrOutputParser()

    The advanced_chain function defines a modular, end-to-end reasoning workflow for answering user queries using cached or real-time search. It initializes the specified Gemini model, selects the retrieval strategy (cached fallback or direct search), constructs a response pipeline incorporating chat history (if enabled), formats documents into context, and prompts the LLM using a system-guided template. The chain also logs the interaction in memory and returns the final answer, parsed into clean text. This design enables flexible experimentation with models and retrieval strategies while maintaining conversation coherence.

    Copy CodeCopiedUse a different Browser
    qa_chain = advanced_chain()
    
    
    def analyze_query(query):
        llm = get_llm(temperature=0)
       
        analysis_prompt = ChatPromptTemplate.from_template(
            """Analyze the following query and provide:
            1. Main topic
            2. Sentiment (positive, negative, neutral)
            3. Key entities mentioned
            4. Query type (factual, opinion, how-to, etc.)
           
            Query: {query}
           
            Return the analysis in JSON format with the following structure:
            {{
                "topic": "main topic",
                "sentiment": "sentiment",
                "entities": ["entity1", "entity2"],
                "type": "query type"
            }}
            """
        )
       
        chain = analysis_prompt | llm | output_parser
       
        return chain.invoke({"query": query})
    
    
    print("Advanced Tavily-Gemini Implementation")
    print("="*50)
    
    
    query = "what year was breath of the wild released and what was its reception?"
    print(f"Query: {query}")

    We initialize the final components of the intelligent assistant. qa_chain is the assembled reasoning pipeline ready to process user queries using retrieval, memory, and Gemini-based response generation. The analyze_query function performs a lightweight semantic analysis on a query, extracting the main topic, sentiment, entities, and query type using the Gemini model and a structured JSON prompt. The example query, about Breath of the Wild’s release and reception, showcases how the assistant is triggered and prepared for full-stack inference and semantic interpretation. The printed heading marks the start of interactive execution.

    Copy CodeCopiedUse a different Browser
    try:
        print("nSearching for answer...")
        answer = qa_chain.invoke({"question": query})
        print("nAnswer:")
        print(answer)
       
        print("nAnalyzing query...")
        try:
            query_analysis = analyze_query(query)
            print("nQuery Analysis:")
            print(json.dumps(query_analysis, indent=2))
        except Exception as e:
            print(f"Query analysis error (non-critical): {e}")
    except Exception as e:
        print(f"Error in search: {e}")
    
    
    history = enhanced_retriever.get_search_history()
    print("nSearch History:")
    for i, h in enumerate(history):
        print(f"{i+1}. Query: {h['query']} - Results: {h['num_results']} - Time: {h['response_time']:.2f}s")
    
    
    print("nAdvanced search with domain filtering:")
    specialized_retriever = EnhancedTavilyRetriever(
        max_results=3,
        search_depth="advanced",
        include_domains=["nintendo.com", "zelda.com"],
        exclude_domains=["reddit.com", "twitter.com"]
    )
    
    
    try:
        specialized_results = specialized_retriever.invoke("breath of the wild sales")
        print(f"Found {len(specialized_results)} specialized results")
       
        summary = summarize_documents(specialized_results, "breath of the wild sales")
        print("nSummary of specialized results:")
        print(summary)
    except Exception as e:
        print(f"Error in specialized search: {e}")
    
    
    print("nSearch Metrics:")
    plot_search_metrics(history)
    

    We demonstrate the complete pipeline in action. It performs a search using the qa_chain, displays the generated answer, and then analyzes the query for sentiment, topic, entities, and type. It also retrieves and prints each query’s search history, response time, and result count. Also, it runs a domain-filtered search focused on Nintendo-related sites, summarizes the results, and visualizes search performance using plot_search_metrics, offering a comprehensive view of the assistant’s capabilities in real-time use.

    In conclusion, following this tutorial gives users a comprehensive blueprint for creating a highly capable, context-aware, and scalable RAG system that bridges real-time web intelligence with conversational AI. The Tavily Search API lets users directly pull fresh and relevant content from the web. The Gemini LLM adds robust reasoning and summarization capabilities, while LangChain’s abstraction layer allows seamless orchestration between memory, embeddings, and model outputs. The implementation includes advanced features such as domain-specific filtering, query analysis (sentiment, topic, and entity extraction), and fallback strategies using a semantic vector cache built with Chroma and GoogleGenerativeAIEmbeddings. Also, structured logging, error handling, and analytics dashboards provide transparency and diagnostics for real-world deployment.


    Check out the Colab Notebook. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 90k+ ML SubReddit.

    The post How to Build a Powerful and Intelligent Question-Answering System by Using Tavily Search API, Chroma, Google Gemini LLMs, and the LangChain Framework appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleLe notizie minori del mondo GNU/Linux e dintorni della settimana nr 20/2025
    Next Article SWE-Bench Performance Reaches 50.8% Without Tool Use: A Case for Monolithic State-in-Context Agents

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    May 18, 2025
    Machine Learning

    SWE-Bench Performance Reaches 50.8% Without Tool Use: A Case for Monolithic State-in-Context Agents

    May 18, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    As part of JMeter WEB DRIVER(Selenium Scripts) Integration With Azure Pipeline but Application is not running at Azure Pipeline

    Development

    DeepSim: AI-Accelerated 3D Physics Simulator for Engineers

    Development

    Use Vue or React Components in a Livewire App with MingleJS

    Development

    Microsoft AI Releases AutoGen v0.4: A Comprehensive Update to Enable High-Performance Agentic AI through Asynchronous Messaging and Modular Design

    Machine Learning

    Highlights

    amirami/localizator

    December 7, 2024

    Localizator is a small tool for Laravel that gives you the ability to extract untranslated…

    10 Linux keyboard shortcuts I use every day

    August 14, 2024

    ERROR_CANT_ENABLE_DENY_ONLY: 5 Ways to Fix

    January 27, 2025

    Researchers Uncover Flaws in Python Package for AI Models and PDF.js Used by Firefox

    May 21, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.