Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Google AI Proposes TransformerFAM: A Novel Transformer Architecture that Leverages a Feedback Loop to Enable the Neural Network to Attend to Its Latent Representations

    Google AI Proposes TransformerFAM: A Novel Transformer Architecture that Leverages a Feedback Loop to Enable the Neural Network to Attend to Its Latent Representations

    April 17, 2024

    Transformers have revolutionized deep learning, yet their quadratic attention complexity limits their ability to process infinitely long inputs. Despite their effectiveness, they suffer from drawbacks such as forgetting information beyond the attention window and needing help with long-context processing. Attempts to address this include sliding window attention and sparse or linear approximations, but they often must catch up at large scales. Drawing inspiration from neuroscience, particularly the link between attention and working memory, there’s a proposed solution: incorporating attention to its latent representations via a feedback loop within the Transformer blocks, potentially leading to the emergence of working memory in Transformers.

    Google LLC researchers have developed TransformerFAM, a unique Transformer architecture employing a feedback loop to enable self-attention to the network’s latent representations, facilitating the emergence of working memory. This innovation improves Transformer performance on long-context tasks across various model sizes (1B, 8B, and 24B) without adding weights, seamlessly integrating with pre-trained models. TransformerFAM maintains past information indefinitely, promisingly handling infinitely long input sequences for LLMs. Without introducing new weights, TransformerFAM allows the reuse of pre-trained checkpoints. Fine-tuning TransformerFAM with LoRA for 50k steps significantly enhances performance across 1B, 8B, and 24B Flan-PaLM LLMs.

    Prior attempts to incorporate feedback mechanisms in Transformers mainly focused on passing output activations from top layers to lower or intermediate ones, neglecting potential representational gaps. While some research compressed information blockwise, none ensured infinite propagation—recurrent cross-attention between blocks and feedback from upper layers integrated past information to subsequent blocks. To overcome quadratic complexity in Transformer context length approaches like sparse attention and linear approximations were explored. Alternatives to attention-based Transformers include MLP-mixer and State Space Models. TransformerFAM draws inspiration from Global Workspace Theory, aiming for a unified attention mechanism for processing various data types.

    Two primary approaches are commonly employed in handling long-context inputs: increasing computational resources or implementing Sliding Window Attention (SWA). SWA, introduced by Big Bird, partitions the input into blocks, caching information block by block, a strategy termed Block Sliding Window Attention (BSWA). Unlike standard SWA, BSWA attends to all information within the ring buffer without masking out past keys and values. It employs two hyperparameters, block size, and memory segment, to control the size and scope of attended information. While BSWA offers linear complexity compared to the quadratic complexity of standard Transformers, it possesses a limited receptive field. This limitation necessitates further innovation to address long-context dependencies effectively.

    FAM is developed in response to this challenge, building upon BSWA’s blockwise structure. FAM integrates feedback activations into each block, dubbed virtual activations, enabling the dynamic propagation of global contextual information across blocks. This architecture fulfills key requirements such as integrated attention, block-wise updates, information compression, and global contextual storage. Incorporating FAM enriches representations and facilitates the propagation of comprehensive contextual information, surpassing the limitations of BSWA. Despite the initial concern of potential inefficiency due to the feedback mechanism, the vectorized map-based self-attention in blocks ensures efficient training and minimal impact on memory consumption and training speed, maintaining parity with TransformerBSWA.

    In the movie “Memento,” the protagonist’s struggle with anterograde amnesia parallels the current limitations of LLMs. While LLMs possess vast long-term memory capabilities, their short-term memory is restricted by attention windows. TransformerFAM offers a solution to addressing anterograde amnesia in LLMs, leveraging attention-based working memory inspired by neuroscience. The study hints at a path toward resolving the memory challenge in deep learning, a crucial precursor to tackling broader issues like reasoning. 

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 40k+ ML SubReddit

    Want to get in front of 1.5 Million AI Audience? Work with us here

    The post Google AI Proposes TransformerFAM: A Novel Transformer Architecture that Leverages a Feedback Loop to Enable the Neural Network to Attend to Its Latent Representations appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleThis AI Paper Explores the Fundamental Aspects of Reinforcement Learning from Human Feedback (RLHF): Aiming to Clarify its Mechanisms and Limitations
    Next Article Tango 2: The New Frontier in Text-to-Audio Synthesis and Its Superior Performance Metrics

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 17, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-48187 – RAGFlow Authentication Bypass

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    GPTKB: Large-Scale Knowledge Base Construction from Large Language Models

    Development

    Securing Operational Technology: The Foundation of Modern Industrial Operations in META Region

    Development

    AI Performance Metrics: Insights from Experts

    Development

    Open O1: Revolutionizing Open-Source AI with Cutting-Edge Reasoning and Performance

    Machine Learning

    Highlights

    CVE-2025-4054 – Relevanssi WordPress Stored Cross-Site Scripting Vulnerability

    May 6, 2025

    CVE ID : CVE-2025-4054

    Published : May 7, 2025, 3:15 a.m. | 20 minutes ago

    Description : The Relevanssi – A Better Search plugin for WordPress is vulnerable to Stored Cross-Site Scripting via the highlights functionality in all versions up to, and including, 4.24.3 due to insufficient input sanitization and output escaping. This makes it possible for unauthenticated attackers to inject arbitrary web scripts in pages that will execute whenever a user accesses an injected page via the search results.

    Severity: 6.1 | MEDIUM

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    DragonRank Exploits IIS Servers with BadIIS Malware for SEO Fraud and Gambling Redirects

    February 10, 2025

    Is free Apple TV+ on the way? The streaming service is teasing something for next weekend

    December 27, 2024

    Police Chiefs Call for Solutions to Access Encrypted Data in Serious Crime Cases

    April 25, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.