Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 14, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 14, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 14, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 14, 2025

      I test a lot of AI coding tools, and this stunning new OpenAI release just saved me days of work

      May 14, 2025

      How to use your Android phone as a webcam when your laptop’s default won’t cut it

      May 14, 2025

      The 5 most customizable Linux desktop environments – when you want it your way

      May 14, 2025

      Gen AI use at work saps our motivation even as it boosts productivity, new research shows

      May 14, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Strategic Cloud Partner: Key to Business Success, Not Just Tech

      May 14, 2025
      Recent

      Strategic Cloud Partner: Key to Business Success, Not Just Tech

      May 14, 2025

      Perficient’s “What If? So What?” Podcast Wins Gold at the 2025 Hermes Creative Awards

      May 14, 2025

      PIM for Azure Resources

      May 14, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Windows 11 24H2’s Settings now bundles FAQs section to tell you more about your system

      May 14, 2025
      Recent

      Windows 11 24H2’s Settings now bundles FAQs section to tell you more about your system

      May 14, 2025

      You can now share an app/browser window with Copilot Vision to help you with different tasks

      May 14, 2025

      Microsoft will gradually retire SharePoint Alerts over the next two years

      May 14, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»This AI Research Diagnoses Problems in Recurrent Neural Networks RNN-based Language Models and Corrects them to Outperform Transformer-based Models on Long Sequence Tasks

    This AI Research Diagnoses Problems in Recurrent Neural Networks RNN-based Language Models and Corrects them to Outperform Transformer-based Models on Long Sequence Tasks

    November 5, 2024

    Recurrent Neural Networks were the trailblazers in natural language processing and set the cornerstone for future advances. RNNs were simple in structure with their contextual memory and constant state size, which promised the capacity to handle long sequence tasks. While theoretically, the design of RNNS pledged to a great future in long context tasks, practically, the results were far from satisfactory. As the context length of RNNs increased, the performance dropped dramatically. Even when we examine the latest SOTA RNN-based language models such as Mamba-1, the performance was poor when the context length exceeded their training tokens, which in most of the cases could not reach even 10,000Despite the linear growth in computation with training, RNNs are incapable of generalizing along the sequence length.  Soon enough, transformers and attention-based models came into the picture, and their advanced variations filled this vacuum. Recent transformer-based language models demonstrated impressive capabilities in reasoning over long sequences with thousands and even millions of tokens. Although these models relied upon quadratically scaling attention mechanisms, they became the priority given their superior performance. This article discusses the latest research that examines how RNNs reached this fate. We first diagnose why  RNNs outpaced this race and further discuss treatment strategies.

    Researchers at Tsinghua University presented their paper to examine RNN-based language models and the significant problems that lead to them falling behind; they then formalized the issues and introduced the concept of State Collapse. Additionally, they propose mitigation methods to improve the length of generalizability of RNNs.

    The authors highlighted the unprecedented behavior of RNNs when context length exceeded training tokens. Furthermore, the research gave insights into information constraints on the state. There are only so many tokens that a recurrent net can remember. Beyond this limit, all the tokens are forgotten, just like students can cram up so much information a day before their End term examinations. Just like the subpar performance in end terms could be attributed to students’ negligence throughout the semester, authors attributed RNNs’ generalization failure to a phenomenon called state collapse. 

    The authors inspected the memory state distribution of RNN over time and discovered that a few dominant outlier channels with exploding values caused its collapse. When the output hidden representation was normalized, these outliers caused vanishing values in other channels. Further, they showed that the state collapse was caused by RNNs’ inability to forget the earliest token and state overparameterization with excessive state capacity, not because of the prompt. Done with the diagnosis of State Collapse and its root cause, the authors proposed three training-free mitigation methods and one method based on continual training to improve the length generalizability of RNNs.The three training-less methods were -: Forget More and Remember Less, State Normalization, and Sliding Window by State Difference. These methods forced the model to forget contextual information by reducing the memory retention and insertion strength, normalizing the recurrent state, or reformulating the recurrence into an equivalent sliding window state. Lastly, they proposed training on context lengths that exceed the model’s state capacity in data engineering and state initialization with Truncated Backpropagation Through Time.

    The authors experimented with various model sizes of Mamba 2 and mitigated state collapse by up to 1 million tokens. They also empirically estimated the state capacity of Mamba-2 on language modeling and the passkey retrieval task. When a few data engineering and state initialization tricks were applied to Mamba 2, it showed remarkable performance. The experimented Mamba-2 370M model could achieve near-perfect passkey retrieval accuracy on 256K context length, significantly outperforming transformer-based models of the same size in both retrieval accuracy and length generalizability. This particular model became the smallest model with near-perfect passkey retrieval accuracy. The authors also established that state capacity is a linear function of the state size.

    This research shows that RNN-based long-context modeling has promising potential, and just like a student who crams the entire syllabus in one night requires an excellent teacher to excel in exams, RNNs also need some care and teaching before and during the training. Hence, the inference is free of generalization error.


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

    [Sponsorship Opportunity with us] Promote Your Research/Product/Webinar with 1Million+ Monthly Readers and 500k+ Community Members

    The post This AI Research Diagnoses Problems in Recurrent Neural Networks RNN-based Language Models and Corrects them to Outperform Transformer-based Models on Long Sequence Tasks appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleTop 20 AI Graphic Design Tools in 2025
    Next Article FEDKIM: A Federated Knowledge Injection Framework for Enhancing Multimodal Medical Foundation Models

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 15, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-3053 – “UiPress Lite WordPress Remote Code Execution Vulnerability”

    May 15, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    CVE-2025-2579 – Lottie Player WordPress Stored Cross-Site Scripting Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    New ‘SpiderX’ Ransomware Emerges as Successor to Notorious Diablo

    Development

    CVE-2025-24252 – Apple macOS Use-After-Free Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Alibaba AI Research Releases CosyVoice 2: An Improved Streaming Speech Synthesis Model

    Development

    Highlights

    Learning Resources

    Debian Package Management: Aptitude vs. Apt-Get in Ubuntu

    April 24, 2025

    by George Whittaker Package management is at the heart of every Linux system. It’s what…

    How to Build Magento 2 Category Page Using ReactJS

    January 29, 2025

    Tap into Your PHP Potential with Free Projects at PHPGurukul

    May 9, 2025

    通过独特的可查询加密技术,MongoDB为数据安全提供覆盖全生命周期的保护

    November 5, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.