Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      This week in AI dev tools: Gemini API Batch Mode, Amazon SageMaker AI updates, and more (July 11, 2025)

      July 11, 2025

      JFrog finds MCP-related vulnerability, highlighting need for stronger focus on security in MCP ecosystem

      July 11, 2025

      8 Key Questions Every CEO Should Ask Before Hiring a Node.js Development Company in 2025

      July 11, 2025

      Vibe Loop: AI-native reliability engineering for the real world

      July 10, 2025

      Why your USB-C device won’t charge – and what you can do instead

      July 12, 2025

      How passkeys work: Going passwordless with public key cryptography

      July 12, 2025

      51% claimed already: This Xbox Edition mechanical keyboard is at its lowest price yet while this sale lasts — Nostalgic green transparency for the win

      July 11, 2025

      This RDR2 deal feels like highway robbery — grab the “Wild West masterpiece” today before it rides off into the sunset

      July 11, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The details of TC39’s last meeting

      July 12, 2025
      Recent

      The details of TC39’s last meeting

      July 12, 2025

      Francisco Bergeret Paves the Way Through Strong Leadership at Perficient

      July 11, 2025

      Intelligent Automation in the Healthcare Sector with n8n, OpenAI, and Pinecone

      July 11, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      51% claimed already: This Xbox Edition mechanical keyboard is at its lowest price yet while this sale lasts — Nostalgic green transparency for the win

      July 11, 2025
      Recent

      51% claimed already: This Xbox Edition mechanical keyboard is at its lowest price yet while this sale lasts — Nostalgic green transparency for the win

      July 11, 2025

      This RDR2 deal feels like highway robbery — grab the “Wild West masterpiece” today before it rides off into the sunset

      July 11, 2025

      Grab these 7 Xbox games all under $40 — you don’t have long before Amazon Prime Day ends, so act fast

      July 11, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»This AI Paper Introduces a Short KL+MSE Fine-Tuning Strategy: A Low-Cost Alternative to End-to-End Sparse Autoencoder Training for Interpretability

    This AI Paper Introduces a Short KL+MSE Fine-Tuning Strategy: A Low-Cost Alternative to End-to-End Sparse Autoencoder Training for Interpretability

    April 5, 2025

    Sparse autoencoders are central tools in analyzing how large language models function internally. Translating complex internal states into interpretable components allows researchers to break down neural activations into parts that make sense to humans. These methods support tracing logic paths and identifying how particular tokens or phrases influence model behavior. Sparse autoencoders are especially valuable for interpretability applications, including circuit analysis, where understanding what each neuron contributes is crucial to ensuring trustworthy model behavior.

    A pressing issue with sparse autoencoder training lies in aligning training objectives with how performance is measured during model inference. Traditionally, training uses mean squared error (MSE) on precomputed model activations. However, this doesn’t optimize for cross-entropy loss, which is used to judge performance when reconstructed activations replace the originals. This mismatch results in reconstructions that perform poorly in real inference settings. More direct methods that train on both MSE and KL divergence solve this issue, but they demand considerable computation, which limits their adoption in practice.

    Several approaches have attempted to improve sparse autoencoder training. Full end-to-end training combining KL divergence and MSE losses offers better reconstruction quality. Still, it comes with a high computational cost of up to 48× higher due to multiple forward passes and lack of activation amortization. An alternative involves using LoRA adapters to fine-tune the base language model around a fixed autoencoder. While efficient, this method modifies the model itself, which isn’t ideal for applications that require analyzing the unaltered architecture.

    An independent researcher from Deepmind has introduced a new solution that applies a brief KL+MSE fine-tuning step at the tail end of the training, specifically for the final 25 million tokens—just 0.5–10% of the usual training data volume. The models come from the Gemma team and Pythia project. It avoids altering the model architecture and minimizes complexity while achieving performance similar to full end-to-end training. It also allows training time savings of up to 90% in scenarios with large models or amortized activation collection without requiring additional infrastructure or algorithmic changes.

    To implement this, the training begins with standard MSE on shuffled activations, followed by a short KL+MSE fine-tuning phase. This phase uses a dynamic balancing mechanism to adjust the weight of KL divergence relative to MSE loss. Instead of manually tuning a fixed β parameter, the system recalculates the KL scaling factor per training batch. The formula ensures the total combined loss maintains the same scale as the original MSE loss. This dynamic control prevents the need for additional hyperparameters and simplifies transfer across model types. Fine-tuning is executed with a linear decay of the learning rate from 5e-5 to 0 over the 25M token window, aligning the process with practical compute budgets and preserving sparsity settings from earlier training.

    Performance evaluations show that this approach reduced the cross-entropy loss gap by 20% to 50%, depending on the sparsity setting. For example, on Pythia-160M with K=80, the KL+MSE fine-tuned model performed slightly better than a full end-to-end model, requiring 50% less wall-clock time. At higher sparsity (K=160), the fine-tuned MSE-only model achieved similar or marginally better outcomes than KL+MSE, possibly due to the simplicity of the objective. Tests with LoRA and linear adapters revealed that their benefits do not stack, as each method corrects a shared error source in MSE-trained autoencoders. Even very low-rank LoRA adapters (rank 2) captured over half the performance gains of full fine-tuning.

    Although cross-entropy results consistently favored the fine-tuned method, interpretability metrics showed mixed trends. On SAEBench, ReLU-based sparse autoencoders saw improvements in sparse probing and RAVEL metrics, while performance on spurious correlation and targeted probe tasks dropped. TopK-based models showed smaller, more inconsistent changes. These results suggest that fine-tuning may yield reconstructions better aligned with model predictions but may not always enhance interpretability, depending on the specific evaluation task or architecture type.

    This research underscores a meaningful advancement in sparse autoencoder training: a computationally light, technically simple method that improves reconstruction accuracy without modifying base models. It addresses key alignment issues in training objectives and delivers practical results across models and sparsity levels. While not uniformly superior in all interpretability metrics, it offers a favorable trade-off between performance and simplicity for tasks like circuit-level analysis.


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.

    🔥 [Register Now] miniCON Virtual Conference on OPEN SOURCE AI: FREE REGISTRATION + Certificate of Attendance + 3 Hour Short Event (April 12, 9 am- 12 pm PST) + Hands on Workshop [Sponsored]

    The post This AI Paper Introduces a Short KL+MSE Fine-Tuning Strategy: A Low-Cost Alternative to End-to-End Sparse Autoencoder Training for Interpretability appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleA Code Implementation to Building a Context-Aware AI Assistant in Google Colab Using LangChain, LangGraph, Gemini Pro, and Model Context Protocol (MCP) Principles with Tool Integration Support
    Next Article SaveFBS Facebook Video Downloader

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 12, 2025
    Machine Learning

    Overcoming Vocabulary Constraints with Pixel-level Fallback

    July 11, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2025-42960 – SAP Business Warehouse and SAP BW/4HANA BEx Tools Authorization Bypass

    Common Vulnerabilities and Exposures (CVEs)

    Casper Malware: After Babar and Bunny, Another Espionage Cartoon

    Development

    Razer BlackShark V2 Pro gets noteworthy sale ahead of Prime Day — “One of the best Xbox headsets you can buy right now”

    News & Updates

    Is Google’s $250-per-month AI subscription plan worth it? Here’s what’s included

    News & Updates

    Highlights

    CVE-2025-2745 – AVEVA PI Web API Cross-Site Scripting (XSS) Vulnerability

    June 12, 2025

    CVE ID : CVE-2025-2745

    Published : June 12, 2025, 8:15 p.m. | 1 hour, 46 minutes ago

    Description : A cross-site scripting vulnerability exists in AVEVA PI Web API version 2023
    SP1 and prior that, if exploited, could allow an authenticated attacker
    (with privileges to create/update annotations or upload media files) to
    persist arbitrary JavaScript code that will be executed by users who
    were socially engineered to disable content security policy protections
    while rendering annotation attachments from within a web browser.

    Severity: 6.5 | MEDIUM

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    CVE-2025-4358 – PHPGurukul Company Visitor Management System SQL Injection Vulnerability

    May 6, 2025

    Citrix Bleed 2 flaw now believed to be exploited in attacks

    June 27, 2025

    CVE-2024-7399: Samsung MagicINFO Vulnerability Now Actively Exploited in the Wild

    May 5, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.