Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Google AI Introduces an Efficient Machine Learning Method to Scale Transformer-based Large Language Models (LLMs) to Infinitely Long Inputs

    Google AI Introduces an Efficient Machine Learning Method to Scale Transformer-based Large Language Models (LLMs) to Infinitely Long Inputs

    April 14, 2024

    Memory is significant for intelligence as it helps to recall past experiences and apply them to current situations. However, because of the way their attention mechanism works, both conventional Transformer models and Transformer-based Large Language Models (LLMs) have limitations when it comes to context-dependent memory. The memory consumption and computation time of this attention mechanism are both quadratic in complexity.

    Compressive memory systems present a viable substitute, with the objective of being more efficient and scalable for managing very lengthy sequences. Compressive memory systems keep storage and computation costs in check by maintaining a constant number of parameters for storing and retrieving information, in contrast to classical attention mechanisms that need memory to expand with the duration of the input sequence. 

    The goal of this system’s parameter adjustment process is to assimilate new information into memory while maintaining its retrievability. However, an efficient compressive memory method that strikes a compromise between simplicity and quality has not yet been adopted by existing LLMs.

    To overcome these limitations, a team of researchers from Google has proposed a unique solution that allows Transformer LLMs to handle arbitrarily lengthy inputs with a constrained memory footprint and computing power. A key component of their approach is an attention mechanism known as Infini-attention, which combines long-term linear attention and masked local attention into a single Transformer block and includes compressive memory in the conventional attention process.

    The primary breakthrough of Infini-attention is its capacity to effectively manage memory while processing lengthy sequences. The model can store and recall data with a fixed set of parameters by using compressive memory, which eliminates the requirement for memory to expand with the length of the input sequence. This keeps computing costs within reasonable bounds and helps control memory consumption.

    The team has shared that this method has shown to be effective in a number of tasks, such as book summarising tasks with input sequences of 500,000 tokens, passkey context block retrieval for sequences up to 1 million tokens in length, and long-context language modeling benchmarks. LLMs of sizes ranging from 1 billion to 8 billion parameters have been used to solve these tasks. 

    The ability to include minimal bounded memory parameters, that is, to limit and anticipate the model’s memory requirements, is one of this approach’s main advantages. Also, fast streaming inference for LLMs has been made possible by the suggested approach, which makes it possible to analyze sequential input efficiently in real-time or almost real-time circumstances. 

    The team has summarized their primary contributions as follows,

    The team has presented Infini-attention, a unique attention mechanism that blends local causal attention with long-term compressive memory. This method is both useful and effective since it effectively represents contextual dependencies over both short and long distances. 

    The standard scaled dot-product attention mechanism needs only be slightly altered to accommodate infini-attention. This enables plug-and-play continuous pre-training and long-context adaptation, and makes incorporation into current Transformer structures simple. 

    The method keeps constrained memory and computational resources while allowing Transformer-based LLMs to accommodate endlessly long contexts. The approach guarantees optimal resource utilization by processing very long inputs in a streaming mode, which enables LLMs to function well in large-scale data real-world applications.

    In conclusion, this study is a major step forward for LLMs, allowing for the efficient handling of very long inputs in terms of computation and memory utilization.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 40k+ ML SubReddit

    Want to get in front of 1.5 Million AI Audience? Work with us here

    The post Google AI Introduces an Efficient Machine Learning Method to Scale Transformer-based Large Language Models (LLMs) to Infinitely Long Inputs appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleTableau vs Power BI: A Comparison of AI-Powered Analytics Tools
    Next Article LWiAI Podcast #162 – Udio Song AI, TPU v5, Mixtral 8×22, Mixture-of-Depths, Musicians sign open letter

    Related Posts

    Machine Learning

    LLMs Struggle with Real Conversations: Microsoft and Salesforce Researchers Reveal a 39% Performance Drop in Multi-Turn Underspecified Tasks

    May 17, 2025
    Machine Learning

    This AI paper from DeepSeek-AI Explores How DeepSeek-V3 Delivers High-Performance Language Modeling by Minimizing Hardware Overhead and Maximizing Computational Efficiency

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Microsoft Copilot teases GPT 1o reasoning with internet (Bing) search for free

    Operating Systems

    CVE-2025-26844 – Znuny Cookie Without HttpOnly Flag Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    You can trade eligible Samsung Galaxy phones even without purchasing a new phone

    Operating Systems

    Understanding Technology Leadership: Definition, Qualities, Skills, and Examples

    Artificial Intelligence

    Highlights

    CVE-2025-31197 – Apple macOS and iOS Local Network App Termination Vulnerability

    April 29, 2025

    CVE ID : CVE-2025-31197

    Published : April 29, 2025, 3:15 a.m. | 3 hours, 40 minutes ago

    Description : The issue was addressed with improved checks. This issue is fixed in macOS Sequoia 15.4, tvOS 18.4, macOS Ventura 13.7.5, iPadOS 17.7.6, macOS Sonoma 14.7.5, iOS 18.4 and iPadOS 18.4, visionOS 2.4. An attacker on the local network may cause an unexpected app termination.

    Severity: 0.0 | NA

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    I finally found a power bank with a wireless charging pad, and it’s sustainably-made

    April 28, 2025

    Many Fuel Tank Monitoring Systems Vulnerable to Disruption

    April 29, 2025

    CVE-2025-29686 – OA System Cross-Site Scripting (XSS)

    May 14, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.