Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»This AI Paper by Toyota Research Institute Introduces SUPRA: Enhancing Transformer Efficiency with Recurrent Neural Networks

    This AI Paper by Toyota Research Institute Introduces SUPRA: Enhancing Transformer Efficiency with Recurrent Neural Networks

    May 17, 2024

    Natural language processing (NLP) has advanced significantly thanks to neural networks, with transformer models setting the standard. These models have performed remarkably well across a range of criteria. However, they pose serious problems because of their high memory requirements and high computational expense, particularly for applications that demand long-context work. This persistent problem motivates the pursuit of more effective substitutes that sustain performance standards while requiring fewer resources.

    The main issue with transformer models is their high memory and processing requirements. Although these models perform well on NLP tasks, they could be more practical in contexts with limited resources. This difficulty highlights the need for models with lower computational overhead that can provide comparable or better performance than current ones. Resolving this issue is essential to increasing the usability and accessibility of modern NLP technology in various applications.

    Existing research includes Linear Transformers, which aim to improve efficiency over softmax transformers. The RWKV model and RetNet offer competitive performance with linear attention mechanisms. State-space models like H3 and Hyena integrate recurrent and convolutional networks for long-sequence tasks. Methods such as Performers, Cosformer, and LUNA focus on enhancing transformer efficiency. The Griffin model combines sliding window and linear attention techniques.

    Researchers from the Toyota Research Institute have introduced Scalable UPtraining for Recurrent Attention (SUPRA), a method to convert pre-trained transformers into recurrent neural networks (RNNs). This approach leverages high-quality pre-training data from transformers while employing a linearization technique that replaces softmax normalization with GroupNorm. SUPRA is unique as it combines the strengths of transformers and RNNs, achieving competitive performance with reduced computational cost.

    The SUPRA methodology involves uptraining transformers such as Llama2 and Mistral-7B. The process replaces softmax normalization with GroupNorm, including a small multi-layer perceptron (MLP) for projecting queries and keys. The models were trained using the RefinedWeb dataset with 1.2 trillion tokens. Training and fine-tuning were performed using a modified version of OpenLM, and evaluations were conducted with the Eleuther evaluation harness on standard NLU benchmarks. This approach allows transformers to operate recurrently and efficiently, handling short and long-context tasks.

    The SUPRA method showed competitive performance on various benchmarks. It outperformed RWKV and RetNet on the HellaSwag benchmark, achieving a score of 77.9 compared to 70.9 and 73.0, respectively. The model also demonstrated strong results on other tasks, with scores of 76.3 on ARC-E, 79.1 on ARC-C, and 46.3 on MMLU. Training required only 20 billion tokens, significantly less than other models. Despite some performance drops in long-context tasks, SUPRA maintained robust results within its training context length.

    In conclusion, the SUPRA method successfully converts pre-trained transformers into efficient RNNs, addressing the high computational costs of traditional transformers. By replacing softmax normalization with GroupNorm and using a small MLP, SUPRA models achieve competitive performance on benchmarks like HellaSwag and ARC-C with significantly reduced training data. This research highlights the potential for scalable, cost-effective NLP models, maintaining robust performance across various tasks and paving the way for more accessible advanced language processing technologies.

    Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 42k+ ML SubReddit

    The post This AI Paper by Toyota Research Institute Introduces SUPRA: Enhancing Transformer Efficiency with Recurrent Neural Networks appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleConsistency Large Language Models (CLLMs): A New Family of LLMs Specialized for the Jacobi Decoding Method for Latency Reduction
    Next Article TIGER-Lab Introduces MMLU-Pro Dataset for Comprehensive Benchmarking of Large Language Models’ Capabilities and Performance

    Related Posts

    Machine Learning

    LLMs Struggle with Real Conversations: Microsoft and Salesforce Researchers Reveal a 39% Performance Drop in Multi-Turn Underspecified Tasks

    May 17, 2025
    Machine Learning

    This AI paper from DeepSeek-AI Explores How DeepSeek-V3 Delivers High-Performance Language Modeling by Minimizing Hardware Overhead and Maximizing Computational Efficiency

    May 17, 2025
    Leave A Reply Cancel Reply

    Hostinger

    Continue Reading

    How to get Open NAT on Xbox Series X|S, Xbox One with port forwarding

    Development

    CVE-2025-4347 – D-Link DIR-600L Critical FormWlSiteSurvey Buffer Overflow Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Language Models

    Development

    International Baccalaureate Exam Hack Speculation Sparks Student Outrage

    Development

    Highlights

    News & Updates

    Call of Duty hint appears to tease new Tony Hawk’s Pro Skater game

    February 21, 2025

    A hint found in the “Grind” map recently added to Call of Duty: Black Ops…

    Ncrack — Crack Network Credentials in Minutes

    December 20, 2024

    Atlas Stream Processing è ora disponibile!

    May 2, 2024

    Best practices for running Apache Cassandra with Amazon EBS

    November 8, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.