Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      How AI further empowers value stream management

      June 27, 2025

      12 Top ReactJS Development Companies in 2025

      June 27, 2025

      Not sure where to go with AI? Here’s your roadmap.

      June 27, 2025

      This week in AI dev tools: A2A donated to Linux Foundation, OpenAI adds Deep Research to API, and more (June 27, 2025)

      June 27, 2025

      Microsoft’s Copilot+ has been here over a year and I still don’t care about it — but I do wish I had one of its features

      June 29, 2025

      SteelSeries’ latest wireless mouse is cheap and colorful — but is this the one to spend your money on?

      June 29, 2025

      DistroWatch Weekly, Issue 1128

      June 29, 2025

      Your Slack app is getting a big upgrade – here’s how to try the new AI features

      June 29, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      How Code Feedback MCP Enhances AI-Generated Code Quality

      June 28, 2025
      Recent

      How Code Feedback MCP Enhances AI-Generated Code Quality

      June 28, 2025

      PRSS Site Creator – Create Blogs and Websites from Your Desktop

      June 28, 2025

      Say hello to ECMAScript 2025

      June 27, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft’s Copilot+ has been here over a year and I still don’t care about it — but I do wish I had one of its features

      June 29, 2025
      Recent

      Microsoft’s Copilot+ has been here over a year and I still don’t care about it — but I do wish I had one of its features

      June 29, 2025

      SteelSeries’ latest wireless mouse is cheap and colorful — but is this the one to spend your money on?

      June 29, 2025

      Microsoft confirms Windows 11 25H2, might make Windows more stable

      June 29, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»A Step-by-Step Guide to Build a Fast Semantic Search and RAG QA Engine on Web-Scraped Data Using Together AI Embeddings, FAISS Retrieval, and LangChain

    A Step-by-Step Guide to Build a Fast Semantic Search and RAG QA Engine on Web-Scraped Data Using Together AI Embeddings, FAISS Retrieval, and LangChain

    May 14, 2025

    In this tutorial, we lean hard on Together AI’s growing ecosystem to show how quickly we can turn unstructured text into a question-answering service that cites its sources. We’ll scrape a handful of live web pages, slice them into coherent chunks, and feed those chunks to the togethercomputer/m2-bert-80M-8k-retrieval embedding model. Those vectors land in a FAISS index for millisecond similarity search, after which a lightweight ChatTogether model drafts answers that stay grounded in the retrieved passages. Because Together AI handles embeddings and chat behind a single API key, we avoid juggling multiple providers, quotas, or SDK dialects.

    Copy CodeCopiedUse a different Browser
    !pip -q install --upgrade langchain-core langchain-community langchain-together 
    faiss-cpu tiktoken beautifulsoup4 html2text
    

    This quiet (-q) pip command upgrades and installs everything the Colab RAG needs. It pulls core LangChain libraries plus the Together AI integration, FAISS for vector search, token-handling with tiktoken, and lightweight HTML parsing via beautifulsoup4 and html2text, ensuring the notebook runs end-to-end without additional setup.

    Copy CodeCopiedUse a different Browser
    import os, getpass, warnings, textwrap, json
    if "TOGETHER_API_KEY" not in os.environ:
        os.environ["TOGETHER_API_KEY"] = getpass.getpass("🔑 Enter your Together API key: ")

    We check whether the TOGETHER_API_KEY environment variable is already set; if not, it securely prompts us for the key with getpass and stores it in os.environ. The rest of the notebook can call Together AI’s API without hard‑coding secrets or exposing them in plain text by capturing the credentials once per runtime.

    Copy CodeCopiedUse a different Browser
    from langchain_community.document_loaders import WebBaseLoader
    URLS = [
        "https://python.langchain.com/docs/integrations/text_embedding/together/",
        "https://api.together.xyz/",
        "https://together.ai/blog"  
    ]
    raw_docs = WebBaseLoader(URLS).load()
    

    WebBaseLoader fetches each URL, strips boilerplate, and returns LangChain Document objects containing the clean page text plus metadata. By passing a list of Together-related links, we immediately collect live documentation and blog content that will later be chunked and embedded for semantic search.

    Copy CodeCopiedUse a different Browser
    from langchain.text_splitter import RecursiveCharacterTextSplitter
    splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=100)
    docs = splitter.split_documents(raw_docs)
    
    
    print(f"Loaded {len(raw_docs)} pages → {len(docs)} chunks after splitting.")

    RecursiveCharacterTextSplitter slices every fetched page into ~800-character segments with a 100-character overlap so contextual clues aren’t lost at chunk boundaries. The resulting list docs holds these bite-sized LangChain Document objects, and the printout shows how many chunks were produced from the original pages, essential prep for high-quality embedding.

    Copy CodeCopiedUse a different Browser
    from langchain_together.embeddings import TogetherEmbeddings
    embeddings = TogetherEmbeddings(
        model="togethercomputer/m2-bert-80M-8k-retrieval"  
    )
    from langchain_community.vectorstores import FAISS
    vector_store = FAISS.from_documents(docs, embeddings)
    

    Here we instantiate Together AI’s 80 M-parameter m2-bert retrieval model as a drop-in LangChain embedder, then feed every text chunk into it while FAISS.from_documents builds an in-memory vector index. The resulting vector store supports millisecond-level cosine searches, turning our scraped pages into a searchable semantic database.

    Copy CodeCopiedUse a different Browser
    from langchain_together.chat_models import ChatTogether
    llm = ChatTogether(
        model="mistralai/Mistral-7B-Instruct-v0.3",        
        temperature=0.2,
        max_tokens=512,
    )
    

    ChatTogether wraps a chat-tuned model hosted on Together AI, Mistral-7B-Instruct-v0.3 to be used like any other LangChain LLM. A low temperature of 0.2 keeps answers grounded and repeatable, while max_tokens=512 leaves room for detailed, multi-paragraph responses without runaway cost.

    Copy CodeCopiedUse a different Browser
    from langchain.chains import RetrievalQA
    qa_chain = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",
        retriever=vector_store.as_retriever(search_kwargs={"k": 4}),
        return_source_documents=True,
    )
    

    RetrievalQA stitches the pieces together: it takes our FAISS retriever (returning the top 4 similar chunks) and feeds those snippets into the llm using the simple “stuff” prompt template. Setting return_source_documents=True means each answer will return with the exact passages it relied on, giving us instant, citation-ready Q-and-A.

    Copy CodeCopiedUse a different Browser
    QUESTION = "How do I use TogetherEmbeddings inside LangChain, and what model name should I pass?"
    result = qa_chain(QUESTION)
    
    
    print("n🤖 Answer:n", textwrap.fill(result['result'], 100))
    print("n📄 Sources:")
    for doc in result['source_documents']:
        print(" •", doc.metadata['source'])

    Finally, we send a natural-language query through the qa_chain, which retrieves the four most relevant chunks, feeds them to the ChatTogether model, and returns a concise answer. It then prints the formatted response, followed by a list of source URLs, giving us both the synthesized explanation and transparent citations in one shot.

    Output from the Final Cell

    In conclusion, in roughly fifty lines of code, we built a complete RAG loop powered end-to-end by Together AI: ingest, embed, store, retrieve, and converse. The approach is deliberately modular, swap FAISS for Chroma, trade the 80 M-parameter embedder for Together’s larger multilingual model, or plug in a reranker without touching the rest of the pipeline. What remains constant is the convenience of a unified Together AI backend: fast, affordable embeddings, chat models tuned for instruction following, and a generous free tier that makes experimentation painless. Use this template to bootstrap an internal knowledge assistant, a documentation bot for customers, or a personal research aide.


    Check out the Colab Notebook here. Also, feel free to follow us on Twitter and don’t forget to join our 90k+ ML SubReddit.

    The post A Step-by-Step Guide to Build a Fast Semantic Search and RAG QA Engine on Web-Scraped Data Using Together AI Embeddings, FAISS Retrieval, and LangChain appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleBuild a financial research assistant using Amazon Q Business and Amazon QuickSight for generative AI–powered insights
    Next Article Agent-Based Debugging Gets a Cost-Effective Alternative: Salesforce AI Presents SWERank for Accurate and Scalable Software Issue Localization

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 29, 2025
    Machine Learning

    AWS costs estimation using Amazon Q CLI and AWS Cost Analysis MCP

    June 27, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2025-48444 – Drupal Quick Node Block Authorization Bypass

    Common Vulnerabilities and Exposures (CVEs)

    Packet is an Android Quick Share App for Linux

    Linux

    Critical Wazuh bug exploited in growing Mirai botnet infection

    Security

    A Guide To Evaluating Your Organizational Product Portfolio

    Web Development

    Highlights

    Linux

    Mozilla Kills Pocket & Fakespot to Focus on Firefox

    May 22, 2025

    Mozilla is shutting down Pocket, the “read it later” service it acquired in 2017 and…

    CVE-2025-32396 – RT-Labs P-Net Heap-based Buffer Overflow

    May 7, 2025

    Microsoft Researchers Introduce ARTIST: A Reinforcement Learning Framework That Equips LLMs with Agentic Reasoning and Dynamic Tool Use

    May 10, 2025

    Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms

    April 9, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.