Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»A Coding Implementation to Build a Conversational Research Assistant with FAISS, Langchain, Pypdf, and TinyLlama-1.1B-Chat-v1.0

    A Coding Implementation to Build a Conversational Research Assistant with FAISS, Langchain, Pypdf, and TinyLlama-1.1B-Chat-v1.0

    March 23, 2025

    RAG-powered conversational research assistants address the limitations of traditional language models by combining them with information retrieval systems. The system searches through specific knowledge bases, retrieves relevant information, and presents it conversationally with proper citations. This approach reduces hallucinations, handles domain-specific knowledge, and grounds responses in retrieved text. In this tutorial, we will demonstrate building such an assistant using the open-source model TinyLlama-1.1B-Chat-v1.0 from Hugging Face, FAISS from Meta, and the LangChain framework to answer questions about scientific papers.

    First, let’s install the necessary libraries:

    Copy CodeCopiedUse a different Browser
    !pip install langchain-community langchain pypdf sentence-transformers faiss-cpu transformers accelerate einops

    Now, let’s import the required libraries: 

    Copy CodeCopiedUse a different Browser
    import os
    import torch
    from langchain.text_splitter import RecursiveCharacterTextSplitter
    from langchain_community.document_loaders import PyPDFLoader
    from langchain_community.vectorstores import FAISS
    from langchain_community.embeddings import HuggingFaceEmbeddings
    from langchain.chains import ConversationalRetrievalChain
    from langchain_community.llms import HuggingFacePipeline
    from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
    import pandas as pd 
    from IPython.display import display, Markdown

    We will mount drive to save the paper in further step:

    Copy CodeCopiedUse a different Browser
    from google.colab import drive
    drive.mount('/content/drive')
    print("Google Drive mounted")

    For our knowledge base, we’ll use PDF documents of scientific papers. Let’s create a function to load and process these documents:

    Copy CodeCopiedUse a different Browser
    def load_documents(pdf_folder_path):
        documents = []
    
    
        if not pdf_folder_path:
            print("Downloading a sample paper...")
            !wget -q https://arxiv.org/pdf/1706.03762.pdf -O attention.pdf
            pdf_docs = ["attention.pdf"]
        else:
            pdf_docs = [os.path.join(pdf_folder_path, f) for f in os.listdir(pdf_folder_path)
                       if f.endswith('.pdf')]
    
    
        print(f"Found {len(pdf_docs)} PDF documents")
    
    
        for pdf_path in pdf_docs:
            try:
                loader = PyPDFLoader(pdf_path)
                documents.extend(loader.load())
                print(f"Loaded: {pdf_path}")
            except Exception as e:
                print(f"Error loading {pdf_path}: {e}")
    
    
        return documents
    
    
    
    
    documents = load_documents("")

    Next, we need to split these documents into smaller chunks for efficient retrieval:

    Copy CodeCopiedUse a different Browser
    def split_documents(documents):
        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=1000,
            chunk_overlap=200,
            length_function=len,
        )
        chunks = text_splitter.split_documents(documents)
        print(f"Split {len(documents)} documents into {len(chunks)} chunks")
        return chunks
    
    
    chunks = split_documents(documents)
    

    We’ll use sentence-transformers to create vector embeddings for our document chunks:

    Copy CodeCopiedUse a different Browser
    def create_vector_store(chunks):
        print("Loading embedding model...")
        embedding_model = HuggingFaceEmbeddings(
            model_name="sentence-transformers/all-MiniLM-L6-v2",
            model_kwargs={'device': 'cuda' if torch.cuda.is_available() else 'cpu'}
        )
    
    
        print("Creating vector store...")
        vector_store = FAISS.from_documents(chunks, embedding_model)
        print("Vector store created successfully!")
        return vector_store
    
    
    vector_store = create_vector_store(chunks)
    

    Now, let’s load an open-source language model to generate responses. We’ll use TinyLlama, which is small enough to run on Colab but still powerful enough for our task:

    Copy CodeCopiedUse a different Browser
    def load_language_model():
        print("Loading language model...")
        model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
    
    
        try:
            import subprocess
            print("Installing/updating bitsandbytes...")
            subprocess.check_call(["pip", "install", "-U", "bitsandbytes"])
            print("Successfully installed/updated bitsandbytes")
        except:
            print("Could not update bitsandbytes, will proceed without 8-bit quantization")
    
    
        from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, pipeline
        import torch
    
    
        tokenizer = AutoTokenizer.from_pretrained(model_id)
    
    
        if torch.cuda.is_available():
            try:
                quantization_config = BitsAndBytesConfig(
                    load_in_8bit=True,
                    llm_int8_threshold=6.0,
                    llm_int8_has_fp16_weight=False
                )
    
    
                model = AutoModelForCausalLM.from_pretrained(
                    model_id,
                    torch_dtype=torch.bfloat16,
                    device_map="auto",
                    quantization_config=quantization_config
                )
                print("Model loaded with 8-bit quantization")
            except Exception as e:
                print(f"Error with quantization: {e}")
                print("Falling back to standard model loading without quantization")
                model = AutoModelForCausalLM.from_pretrained(
                    model_id,
                    torch_dtype=torch.bfloat16,
                    device_map="auto"
                )
        else:
            model = AutoModelForCausalLM.from_pretrained(
                model_id,
                torch_dtype=torch.float32,
                device_map="auto"
            )
    
    
        pipe = pipeline(
            "text-generation",
            model=model,
            tokenizer=tokenizer,
            max_length=2048,
            temperature=0.2,
            top_p=0.95,
            repetition_penalty=1.2,
            return_full_text=False
        )
    
    
        from langchain_community.llms import HuggingFacePipeline
        llm = HuggingFacePipeline(pipeline=pipe)
        print("Language model loaded successfully!")
        return llm
    
    
    llm = load_language_model()
    

    Now, let’s build our assistant by combining the vector store and language model:

    Copy CodeCopiedUse a different Browser
    def format_research_assistant_output(query, response, sources):
        output = f"n{'=' * 50}n"
        output += f"USER QUERY: {query}n"
        output += f"{'-' * 50}nn"
        output += f"ASSISTANT RESPONSE:n{response}nn"
        output += f"{'-' * 50}n"
        output += f"SOURCES REFERENCED:nn"
    
    
        for i, doc in enumerate(sources):
            output += f"Source #{i+1}:n"
            content_preview = doc.page_content[:200] + "..." if len(doc.page_content) > 200 else doc.page_content
            wrapped_content = textwrap.fill(content_preview, width=80)
            output += f"{wrapped_content}nn"
    
    
        output += f"{'=' * 50}n"
        return output
    
    
    import textwrap
    
    
    research_assistant = create_research_assistant(vector_store, llm)
    
    
    test_queries = [
        "What is the key idea behind the Transformer model?",
        "Explain self-attention mechanism in simple terms.",
        "Who are the authors of the paper?",
        "What are the main advantages of using attention mechanisms?"
    ]
    
    
    for query in test_queries:
        response, sources = research_assistant(query, return_sources=True)
        formatted_output = format_research_assistant_output(query, response, sources)
        print(formatted_output)

    In this tutorial, we built a conversational research assistant using Retrieval-Augmented Generation with open-source models. RAG enhances language models by integrating document retrieval, reducing hallucination, and ensuring domain-specific accuracy. The guide walks through setting up the environment, processing scientific papers, creating vector embeddings using FAISS and sentence transformers, and integrating an open-source language model like TinyLlama. The assistant retrieves relevant document chunks and generates responses with citations. This implementation allows users to query a knowledge base, making AI-powered research more reliable and efficient for answering domain-specific questions.


    Here is the Colab Notebook. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 85k+ ML SubReddit.

    The post A Coding Implementation to Build a Conversational Research Assistant with FAISS, Langchain, Pypdf, and TinyLlama-1.1B-Chat-v1.0 appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleLe notizie minori del mondo GNU/Linux e dintorni della settimana nr 12/2025
    Next Article Sea AI Lab Researchers Introduce Dr. GRPO: A Bias-Free Reinforcement Learning Method that Enhances Math Reasoning Accuracy in Large Language Models Without Inflating Responses

    Related Posts

    Machine Learning

    Salesforce AI Releases BLIP3-o: A Fully Open-Source Unified Multimodal Model Built with CLIP Embeddings and Flow Matching for Image Understanding and Generation

    May 16, 2025
    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    May 16, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Bilt Rewards now lets you pay your student loans with points

    News & Updates

    Apple Patches Two Actively Exploited iOS Flaws Used in Sophisticated Targeted Attacks

    Development

    Ransomware kingpin who called himself “J P Morgan” extradited to United States

    Development

    CodeSOD: Magical Bytes

    Development

    Highlights

    Development

    The Technical Power of Unity Catalog – Beyond Governance

    August 23, 2024

    If you use Databricks, you probably know that Databricks Unity Catalog is the industry’s only…

    DeepMind AI staff reportedly tied to Google with “aggressive” noncompete clause — Preventing them from joining rivals like Microsoft but offering year-long PTO

    DeepMind AI staff reportedly tied to Google with “aggressive” noncompete clause — Preventing them from joining rivals like Microsoft but offering year-long PTO

    April 8, 2025

    Debugging and Error Handling in VBA for Excel

    January 11, 2025

    Beyond APT: Software Management with Flatpak on Ubuntu

    May 9, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.