Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Artificial Intelligence»Google’s Infini-attention gives LLMs “infinite” context

    Google’s Infini-attention gives LLMs “infinite” context

    April 15, 2024

    Google researchers developed a technique called Infini-attention, which allows LLMs to handle infinitely long text without increasing compute and memory requirements.

    The Transformer architecture of an LLM is what allows it to give attention to all of the tokens in a prompt. The complex dot-product and matrix multiplications it performs are quadratic in complexity.

    This means that doubling the tokens in your prompt results in a requirement of four times more memory and processing power. This is why it’s so challenging to make LLMs with large context windows without having memory and compute requirements skyrocket.

    In a “standard” LLM, information at the beginning of the prompt content is lost once the prompt becomes larger than the context window. Google’s research paper explains how Infini-attention can retain data beyond the context window.

    Google presents Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

    1B model that was fine-tuned on up to 5K sequence length passkey instances solves the 1M length problemhttps://t.co/zyHMt3inhi pic.twitter.com/ySYEMET9Ef

    — Aran Komatsuzaki (@arankomatsuzaki) April 11, 2024

    How does Infini-attention work?

    Infini-attention combines compressive memory techniques with modified attention mechanisms so that relevant older information isn’t lost.

    Once the input prompt grows beyond the context length of the model, the compressive memory stores information in a compressed format rather than discarding it.

    This allows for older, less immediately relevant information to be stored without memory and compute requirements growing indefinitely as the input grows.

    Instead of trying to retain all the older input information, Infini-attention’s compressive memory weighs and summarizes information that is deemed relevant and worth retaining.

    Infini-attention then takes a “vanilla” attention mechanism but reuses the key value (KV) states from each subsequent segment in the model rather than discarding them.

    Here’s a diagram that shows the difference between Infini-attention and another extended context model Transformer XL.

    Infini-Transformer (top) has an entire context history whereas Transformer-XL(bottom) discards old contexts since it caches the KV states for the last segment only. Source: arXiv

    The result is an LLM that gives local attention to recent input data but also carries continuously distilled compressed historical data to which it can apply long-term attention.

    The paper notes that “This subtle but critical modification to the attention layer enables LLMs to process infinitely long contexts with bounded memory and computation resources.“

    How good is it?

    Google ran benchmarking tests using smaller 1B and 8B parameter Infini-attention models. These were compared against other extended context models like Transformer-XL and Memorizing Transformers.

    The Infini-Transformer achieved significantly lower perplexity scores than the other models when processing long-context content. A lower perplexity score means the model is more certain of its output predictions.

    In the “passkey retrieval” tests the Infini-attention models consistently found the random number hidden in text of up to 1M tokens.

    Other models often manage to retrieve the passkey towards the end of the input but struggle to find it in the middle or beginning of long content. Infini-attention had no trouble with this test.

    The benchmarking tests are very technical but the short story is that Infini-attention outperformed the baseline models in summarizing and handling long sequences while maintaining context over extended periods.

    Significantly, it retained this superior retention capability while requiring 114x less memory.

    The benchmark results convince the researchers that Infini-attention could be scaled to handle extremely long input sequences keeping the memory and computational resources bounded.

    The plug-and-play nature of Infini-attention means it could be used for continual pre-training and fine-tuning of existing Transformer models. This could effectively extend their context windows without requiring complete retraining of the model.

    Context windows will keep growing, but this approach shows that an efficient memory could be a better solution than a large library.

    The post Google’s Infini-attention gives LLMs “infinite” context appeared first on DailyAI.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleChinese-Linked LightSpy iOS Spyware Targets South Asian iPhone Users
    Next Article What is a Bank Reconciliation Statement & How to do it?

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 16, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-47916 – Invision Community Themeeditor Remote Code Execution

    May 16, 2025

    1 Comment

    1. binance on July 29, 2024 5:44 PM

      Your point of view caught my eye and was very interesting. Thanks. I have a question for you.

      Reply
    Leave A Reply Cancel Reply

    Continue Reading

    HybridRAG: A Hybrid AI System Formed by Integrating Knowledge Graphs and Vector Retrieval Augmented Generation Outperforming both Individually

    Development

    YASnippet – template system for Emacs

    Linux

    Nintendo Switch 2 pre-orders delayed, new price hike likely – here’s why

    News & Updates

    Dynamic Mailer Configuration in Laravel with Mail::build

    Development
    GetResponse

    Highlights

    Shopify Winter ’25 Edition: The “Boring” Site That’s Anything But

    December 17, 2024

    Discover how Shopify’s Winter ’25 Edition turns ‘boring’ into bold, delivering over 150 thoughtful updates…

    Step Towards Best Practices for Open Datasets for LLM Training

    January 20, 2025

    TUBA: Finally, an Error Management Tool for Microservices

    May 16, 2024

    Community News: Latest PECL Releases (11.26.2024)

    November 26, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.