Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      June 1, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 1, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 1, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 1, 2025

      My top 5 must-play PC games for the second half of 2025 — Will they live up to the hype?

      June 1, 2025

      A week of hell with my Windows 11 PC really makes me appreciate the simplicity of Google’s Chromebook laptops

      June 1, 2025

      Elden Ring Nightreign Night Aspect: How to beat Heolstor the Nightlord, the final boss

      June 1, 2025

      New Xbox games launching this week, from June 2 through June 8 — Zenless Zone Zero finally comes to Xbox

      June 1, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Student Record Android App using SQLite

      June 1, 2025
      Recent

      Student Record Android App using SQLite

      June 1, 2025

      When Array uses less memory than Uint8Array (in V8)

      June 1, 2025

      Laravel 12 Starter Kits: Definite Guide Which to Choose

      June 1, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      My top 5 must-play PC games for the second half of 2025 — Will they live up to the hype?

      June 1, 2025
      Recent

      My top 5 must-play PC games for the second half of 2025 — Will they live up to the hype?

      June 1, 2025

      A week of hell with my Windows 11 PC really makes me appreciate the simplicity of Google’s Chromebook laptops

      June 1, 2025

      Elden Ring Nightreign Night Aspect: How to beat Heolstor the Nightlord, the final boss

      June 1, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»From Softmax to SSMax: Enhancing Attention and Key Information Retrieval in Transformers

    From Softmax to SSMax: Enhancing Attention and Key Information Retrieval in Transformers

    February 4, 2025

    Transformer-based language models process text by analyzing word relationships rather than reading in order. They use attention mechanisms to focus on keywords, but handling longer text is challenging. The Softmax function, which distributes attention, weakens as the input size grows, causing attention fading. This reduces the model’s focus on important words, making it harder to learn from long texts. As the attention values get smaller, the details become unclear, thus rendering the model ineffective for larger inputs. Unless there is a modification in the attention mechanism, the model does not focus on essential information and, therefore, fails to work well on larger text inputs.

    Current methods to improve length generalization in Transformer-based models include positional encoding, sparse attention, extended training on longer texts, and enhanced attention mechanisms. These methods are not scalable and require a lot of computational resources, making them inefficient for handling long inputs. The Softmax function, used in the case of attention distribution in Transformers, degrades as the input size grows. For more tokens, Softmax generates more flat distributions of probabilities that lead to decreasing the emphasis on keywords. Such a phenomenon is known as attention fading, severely limiting the model’s ability to process long text.

    To mitigate attention fading in Transformers, a researcher from The University of Tokyo proposed Scalable-Softmax (SSMax), which modifies the Softmax function to maintain attention on important tokens even when the input size increases. Unlike Softmax, which causes attention to spread thinly as the input grows, SSMax adjusts the scaling factor based on the input size, ensuring that the highest value remains dominant. This avoids loss of focus on key information in larger contexts. This framework incorporates a scaling factor that involves the size of the input, which alters the formula for calculating attention by using a logarithm. The model dynamically adapts to concentrate on relevant elements when variations apply and distributes attention when similar values are used. SSMax integrates easily into existing architectures with minimal changes, requiring only a simple multiplication in the attention computation. 

    To evaluate the impact of replacing Softmax with Scalable-Softmax (SSMax) in attention layers, the researcher conducted experiments on training efficiency, long-context generalization, key information retrieval, and attention allocation. They tested six configurations, including standard Softmax, SSMax with and without a scaling parameter, SSMax with a bias parameter, and two models where Softmax was replaced with SSMax after or during pretraining. SSMax consistently improved training efficiency and long-context generalization, reducing test loss across extended sequence lengths. The Needle-In-A-Haystack test revealed that SSMax significantly enhanced key information retrieval in long contexts. However, removing the scaling parameter or adding a bias degraded performance. Models where Softmax was replaced with SSMax post-training or late in pretraining, showed partial improvements but failed to match fully trained SSMax models.

    In summary, this proposed method improved transformer attention, which defeats attention fading and strengthens length generalization, making models more effective in long-context tasks. Its adaptability benefited newly trained and existing models, positioning it as a strong alternative to Softmax. The future can optimize SSMax for efficiency and integrate it into emerging Transformer models to enhance long-context understanding in real-world applications.


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 75k+ ML SubReddit.

    🚨 Marktechpost is inviting AI Companies/Startups/Groups to partner for its upcoming AI Magazines on ‘Open Source AI in Production’ and ‘Agentic AI’.

    The post From Softmax to SSMax: Enhancing Attention and Key Information Retrieval in Transformers appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleGoogle DeepMind Researchers Unlock the Potential of Decoding-Based Regression for Tabular and Density Estimation Tasks
    Next Article University of Bath Researchers Developed an Efficient and Stable Machine Learning Training Method for Neural ODEs with O(1) Memory Footprint

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 1, 2025
    Machine Learning

    Enigmata’s Multi-Stage and Mix-Training Reinforcement Learning Recipe Drives Breakthrough Performance in LLM Puzzle Reasoning

    June 1, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Forget cheap multitools. My favorite brand is repairable with a 25-year warranty

    News & Updates

    This MagSafe accessory transforms your iPhone into a point-and-shoot camera (sort of)

    News & Updates

    GNU/Linux: Rimozione Sicura dei Files

    Development

    Strategies for Overcoming Web Design Project Standstills

    Development

    Highlights

    Development

    New EU Sanctions Blacklist Russian and North Korean Cyber Operatives

    February 25, 2025

    The Council of the European Union took decisive action to impose a new set of…

    When to choose GitHub-Hosted runners or self-hosted runners with GitHub Actions

    April 15, 2025

    DAI#33 – Games, voice clones, and AI fortune tellers

    April 5, 2024

    “It’s not even funny anymore.” Helldivers 2 players are putting the underwhelming new Warbond on blast, and I’m with them

    May 10, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.