Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»KBLAM: Efficient Knowledge Base Augmentation for Large Language Models Without Retrieval Overhead

    KBLAM: Efficient Knowledge Base Augmentation for Large Language Models Without Retrieval Overhead

    March 21, 2025

    LLMs have demonstrated strong reasoning and knowledge capabilities, yet they often require external knowledge augmentation when their internal representations lack specific details. One method for incorporating new information is supervised fine-tuning, where models are trained on additional datasets to update their weights. However, this approach is inefficient as it requires retraining whenever new knowledge is introduced and may lead to catastrophic forgetting, degrading the model’s performance on general tasks. To overcome these limitations, alternative techniques that preserve the model’s weights have gained popularity. RAG is one approach that retrieves relevant knowledge from unstructured text and appends it to the input query before passing it through the model. By dynamically retrieving information, RAG enables LLMs to access large knowledge bases while maintaining a smaller context size. However, as long-context models such as GPT-4 and Gemini have emerged, researchers have explored in-context learning, where external knowledge is directly provided in the model’s input. This eliminates the need for retrieval but comes with computational challenges, as processing long contexts requires significantly more memory and time.

    Several advanced techniques have been developed to enhance LLMs’ ability to integrate external knowledge more efficiently. Structured attention mechanisms improve memory efficiency by segmenting the context into independent sections, reducing the computational load of self-attention. Key-value (KV) caching optimizes response generation by storing precomputed embeddings at different layers, allowing the model to recall relevant information without recalculating it. This reduces the complexity from quadratic to linear concerning context length. Unlike traditional KV caching, which requires full recomputation when the input changes, newer methods allow selective updates, making external knowledge integration more flexible. 

    Researchers from Johns Hopkins University and Microsoft propose a Knowledge Base Augmented Language Model (KBLAM), a method for integrating external knowledge into LLMs. KBLAM converts structured knowledge base (KB) triples into key-value vector pairs, seamlessly embedding them within the LLM’s attention layers. Unlike RAG, it eliminates external retrievers, and unlike in-context learning, it scales linearly with KB size. KBLAM enables efficient dynamic updates without retraining and enhances interpretability. Trained using instruction tuning on synthetic data, it improves reliability by refusing to answer when relevant knowledge is absent, reducing hallucinations and enhancing scalability.

    KBLAM enhances LLMs by integrating a KB through two steps. First, each KB triple is converted into continuous key-value embeddings, termed knowledge tokens, using a pre-trained sentence encoder and linear adapters. These tokens are then incorporated into each attention layer via a rectangular attention structure, allowing efficient retrieval without altering the LLM’s core parameters. This method ensures scalability, mitigates positional bias and maintains reasoning abilities. Additionally, instruction tuning optimizes knowledge token projection without modifying the LLM, using a synthetic KB to prevent memorization. This approach efficiently integrates large KBs while preserving the model’s original capabilities.

    The empirical evaluation of KBLAM demonstrates its effectiveness as a knowledge retrieval and reasoning model. After instruction tuning, its attention matrix exhibits interpretable patterns, allowing accurate retrieval. KBLAM achieves performance comparable to in-context learning while significantly reducing memory usage and maintaining scalability up to 10K triples. It can also refuse to answer when no relevant knowledge is found, with “over-refusal” occurring later than in-context learning. The model is trained on an instruction-tuned Llama3-8B and optimized using AdamW. Evaluation of synthetic and Enron datasets confirms KBLAM’s strong retrieval accuracy, efficient knowledge integration, and ability to minimize hallucinations.

    In conclusion, KBLAM is an approach for enhancing LLMs with external KBs. It encodes KB entries as continuous key-value vector pairs using pre-trained sentence encoders with linear adapters and integrates them into LLMs through a specialized attention mechanism. Unlike Retrieval-Augmented Generation, KBLAM removes external retrieval modules, and unlike in-context learning, it scales linearly with KB size. This enables efficient integration of over 10K triples into an 8B LLM within an 8K context window on a single A100 GPU. Experiments show its effectiveness in question-answering and reasoning tasks while maintaining interpretability and enabling dynamic knowledge updates.


    Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.

    The post KBLAM: Efficient Knowledge Base Augmentation for Large Language Models Without Retrieval Overhead appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleA Step-by-Step Guide to Building a Semantic Search Engine with Sentence Transformers, FAISS, and all-MiniLM-L6-v2
    Next Article How to Use SQL Databases with Python: A Beginner-Friendly Tutorial

    Related Posts

    Machine Learning

    Salesforce AI Releases BLIP3-o: A Fully Open-Source Unified Multimodal Model Built with CLIP Embeddings and Flow Matching for Image Understanding and Generation

    May 16, 2025
    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    May 16, 2025
    Leave A Reply Cancel Reply

    Hostinger

    Continue Reading

    Inspiration Lies Everywhere

    Web Development

    Hands on: Microsoft is building an AI Shell for Windows 11 command line

    Development

    EPA Steps Up Enforcement to Protect US Drinking Water from Cyber Attacks

    Development

    Defold is a cross-platform game engine

    Linux

    Highlights

    A Windows 11 bug is incorrectly nagging users to change their timezone

    June 25, 2024

    Microsoft is aware of a bug in Windows 11 (and Windows 10) that could spam…

    AWS vs. Azure: Comparison of Two Cloud Platform Giants

    April 13, 2024

    Can’t get text out of Webelement using Selenium

    May 6, 2024

    The Forgotten Plague

    April 4, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.