Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Google’s Agent2Agent protocol finds new home at the Linux Foundation

      June 23, 2025

      Decoding The SVG path Element: Curve And Arc Commands

      June 23, 2025

      This week in AI dev tools: Gemini 2.5 Pro and Flash GA, GitHub Copilot Spaces, and more (June 20, 2025)

      June 20, 2025

      Gemini 2.5 Pro and Flash are generally available and Gemini 2.5 Flash-Lite preview is announced

      June 19, 2025

      Best early Prime Day Nintendo Switch deals: My 17 favorite sales live now

      June 23, 2025

      How I use VirtualBox to run any OS on my Mac – including Linux

      June 23, 2025

      Apple will give you a free pair of AirPods when you buy a MacBook or iPad for school – here’s who’s eligible

      June 23, 2025

      How Apple’s biggest potential acquisition ever could perplex AI rivals like Google

      June 23, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Music Streaming Platform using PHP and MySQL

      June 23, 2025
      Recent

      Music Streaming Platform using PHP and MySQL

      June 23, 2025

      Solutions That Benefit Everyone – Why Inclusive Design Matters for All

      June 23, 2025

      Reducing Barriers Across Industries Through Inclusive Design

      June 23, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Windows 11 Installation Assistant Download: 2025 Guide

      June 23, 2025
      Recent

      Windows 11 Installation Assistant Download: 2025 Guide

      June 23, 2025

      Didn’t Receive Gears of War: Reloaded Code? Explainer

      June 23, 2025

      Fix Vibrant Visuals Greyed Out in Minecraft Bedrock

      June 23, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Researchers from Fudan University Introduce Lorsa: A Sparse Attention Mechanism That Recovers Atomic Attention Units Hidden in Transformer Superposition

    Researchers from Fudan University Introduce Lorsa: A Sparse Attention Mechanism That Recovers Atomic Attention Units Hidden in Transformer Superposition

    May 7, 2025

    Large Language Models (LLMs) have gained significant attention in recent years, yet understanding their internal mechanisms remains challenging. When examining individual attention heads in Transformer models, researchers have identified specific functionalities in some heads, such as induction heads that predict tokens like ‘Potter’ following ‘Harry’ when the phrase appears in context. Ablation studies confirm these heads’ causal relationship to model behaviours. However, most attention heads distribute focus across diverse contexts without clear functionality. The challenge lies in interpreting these complex attention patterns, as inter-head collaboration often occurs rather than isolated functionality. This phenomenon resembles feature superposition in neural interpretation, suggesting the existence of attention superposition in Multi-Head Self-Attention (MHSA) mechanisms. Understanding these complex interactions is crucial for developing more transparent and controllable language models.

    Previous research has made significant strides in explaining individual attention head functionality using techniques like activation patching and path patching. These approaches have identified several specialised attention heads in transformer models, including composition heads, induction heads, name mover heads, number comparison heads, copy suppression heads, successor heads, and long context retrieval heads. However, the superposition hypothesis suggests that neurons relate to multiple non-orthogonal underlying features rather than single functionalities. Sparse Autoencoders have emerged as a promising method to extract overcomplete sets of sparse, linearly comprehensible features from neural networks. The success of these autoencoders demonstrates the universality of superposition across various dimensions, including model size, architecture types, and even different modalities. These methods, while valuable, still struggle to fully explain the complex interactions between attention heads and their collaborative behaviour in language models.

    The research from the Shanghai Innovation Institute, OpenMOSS Team, School of Computer Science, Fudan University introduce Low-Rank Sparse Attention (Lorsa), a robust approach to disentangle atomic attention units from attention superposition. Lorsa replaces standard Multi-Head Self-Attention with an overcomplete set of attention heads that feature single-dimensional OV circuits and sparsity constraints. To evaluate Lorsa, researchers developed an exploration interface that provides comprehensive information on each Lorsa head, quantitatively assessing interpretability through top activations and attribution patterns. Results demonstrate that Lorsa’s monosemanticity compares favorably to Sparse Autoencoder features. The method was tested on both Pythia-160M and Llama-3.1-8B models, successfully identifying known attention mechanisms such as induction heads, name mover heads, successor heads, and attention sinks. Further analysis revealed arithmetic-specific Lorsa heads in Llama-3.1-8B and identified thematic anchor heads exhibiting long-range, topic-specific attention patterns. This approach provides unprecedented visibility into transformer attention mechanisms.

    Attention superposition in Transformer models parallels how neurons represent more features than their dimensions. The research hypothesises that MHSA comprises multiple attention units in superposition, each attending between specific token pairs with interpretable read/write operations on the residual stream. This hypothesis suggests atomic attention units spread across multiple MHSA heads, while individual heads contain multiple units.

    Three key pieces of evidence support attention superposition: First, polysemantic heads respond to unrelated inputs, like successor heads that increment days, numbers, and exhibit acronym/copying behaviours simultaneously. Second, most attention heads lack clear interpretation patterns, with studies showing failed interpretation attempts for over 90% of GPT-2 heads. Third, direct observations show attention output features collectively contributed by multiple heads, with approximately 25% of learned attention units spread across multiple MHSA heads.

    Understanding attention superposition matters significantly for two key reasons. First, attribution-based circuit tracing becomes challenging when features compute collectively, as individual Query-Key patterns may be misled due to interference from other features within the same heads. Second, the structure of attention superposition may reveal important model biology motifs, raising questions about why certain attention units, like induction heads, are implemented by single MHSA heads while others exist in superposition.

    The Lorsa architecture addresses these challenges through several innovative design elements. Lorsa is trained to predict MHSA outputs by minimising mean square error. It employs one-dimensional OV circuits that restrict read/write operations to specific residual stream features, aligning with the linear representation hypothesis. For Query and Key weights, Lorsa implements parameter sharing across every DLorsa QK head, maintaining parameter efficiency while preserving performance. This strategy makes Lorsa QK circuits similar to MHSA but with sparsity constraints on each OV dimension.

    Lorsa employs orders of magnitude more heads than standard MHSA while activating only a small subset per token. For each position, Lorsa’s output aggregates only the top-K heads with the largest activation values, with the active head subset varying dynamically across token positions. This approach resembles TopK-SAEs, selecting the most salient linear components. While similar to attention Sparse Autoencoders, Lorsa differs in that its head activations derive from attention patterns of previous tokens rather than simple linear encoders with ReLU.

    Lorsa’s interpretability assessment employs several key metrics to understand individual head functionality. Top activations help identify patterns by examining the 16 highest-activating tokens for each Lorsa head across 100 million samples from held-out data. The z pattern analysis decomposes activations linearly into token-wise contributions from preceding positions, revealing which previous tokens contribute to current activations. This approach parallels direct feature attribution analysis used for attention Sparse Autoencoders, but with simpler attribution involving just one one-dimensional OV circuit and a single QK circuit.

    A visualisation dashboard provides comprehensive information about each Lorsa head. For example, a “you”-specific induction head shows several important patterns: it primarily reads from features indicating the current token is “you”/”your” through its weight vector, strongly activates a “say you” feature that amplifies the logit of “you,” and increases prediction probabilities for various “you” tokens. The QK attention pattern computation involves current token features at the query position and previous token features where the current token is “you,” with the previous token often being words like “with,” “thank,” or “do.” Interestingly, this particular Lorsa head is almost equally distributed between two MHSA heads (5.0 and 5.7), demonstrating how Lorsa successfully disentangles attention units that exist across multiple standard attention heads.

    Results confirm Lorsa’s effectiveness in identifying known attention mechanisms across different models. Using path patching, researchers rediscovered previously documented monosemantic heads in Pythia-160M, including induction heads, name mover heads, copy suppression heads, successor heads, and attention sinks. In Llama-3.1-8B, they identified arithmetic-specific Lorsa heads that activate during simple arithmetic operations, with each head using distinct heuristics to fetch operands. In addition to this, they discovered “thematic anchor” heads that exhibit long-range attention to topically related tokens, suggesting a mechanism for maintaining persistent topic representations that bias subsequent token predictions toward domain-appropriate vocabulary and structures.

    Low-Rank Sparse Attention successfully disentangles atomic attention units from attention superposition in Transformer models. The method effectively recovers known attention mechanisms while uncovering new interpretable behaviours, demonstrating its value for neural network interpretability. Despite these advances, significant challenges remain in unbinding QK circuits to achieve fully independent heads and reducing superposition effects. Future research directions include exploring low-dimensional QK structures, cross-layer superposition, and systematic Q/K/V composition. 


    Check out the Paper, Model on Hugging Face and GitHub Page. Also, don’t forget to follow us on Twitter.

    Here’s a brief overview of what we’re building at Marktechpost:

    • Newsletter– airesearchinsights.com/(30k+ subscribers)
    • miniCON AI Events – minicon.marktechpost.com
    • AI Reports & Magazines – magazine.marktechpost.com
    • AI Dev & Research News – marktechpost.com (1M+ monthly readers)
    • ML News Community – r/machinelearningnews (92k+ members)

    The post Researchers from Fudan University Introduce Lorsa: A Sparse Attention Mechanism That Recovers Atomic Attention Units Hidden in Transformer Superposition appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleDistribution Release: Plamo Linux 8.2
    Next Article CVE-2024-55651 – i-Educar Stored Cross-Site Scripting Vulnerability

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 23, 2025
    Machine Learning

    New AI Framework Evaluates Where AI Should Automate vs. Augment Jobs, Says Stanford Study

    June 23, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Windows 10 is dead: Here’s why it makes sense to buy a Snapdragon X PC for an upgrade to Windows 11

    News & Updates

    OpenAI CEO Sam Altman claims “ChatGPT is already more powerful than any human who has ever lived”

    News & Updates

    Sintesi di cos’è Ubuntu Linux

    Linux

    5 simple ways to start taking control of your online privacy today

    News & Updates

    Highlights

    CVE-2025-40567 – Siemens SCALANCE Web Interface Load Rollback Authorization Vulnerability

    June 10, 2025

    CVE ID : CVE-2025-40567

    Published : June 10, 2025, 4:15 p.m. | 34 minutes ago

    Description : A vulnerability has been identified in RUGGEDCOM RST2428P (6GK6242-6PA00) (All versions
    Severity: 6.5 | MEDIUM

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    CVE-2025-5082 – “WordPress WP Attachments Reflected Cross-Site Scripting Vulnerability”

    May 28, 2025

    Your Samsung Galaxy Watch is about to get a big upgrade for free – 4 features I can’t wait to try

    June 18, 2025

    RoboCat: A self-improving robotic agent

    May 27, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.