Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»This Machine Learning Research Introduces Mechanistic Architecture Design (Mad) Pipeline: Encompassing Small-Scale Capability Unit Tests Predictive of Scaling Laws

    This Machine Learning Research Introduces Mechanistic Architecture Design (Mad) Pipeline: Encompassing Small-Scale Capability Unit Tests Predictive of Scaling Laws

    April 6, 2024

    Creating deep learning architectures requires a lot of resources because it involves a large design space, lengthy prototyping periods, and expensive computations related to at-scale model training and evaluation. Architectural improvements are achieved through an opaque development process guided by heuristics and individual experience rather than systematic procedures. This is due to the combinatorial explosion of possible designs and the lack of reliable prototyping pipelines despite progress on automated neural architecture search methods. The necessity for principled and agile design pipelines is further emphasized by the high expenses and lengthy iteration periods linked to training and testing new designs, exacerbating the problem. 

    Despite the abundance of potential architectural designs, most models use variants on a standard Transformer recipe that alternates between memory-based (self-attention layers) and memoryless (shallow FFNs) mixers. The original Transformer design is the basis for this specific set of computational primitives known to enhance quality. Empirical evidence suggests that these primitives excel at specific sub-tasks within sequence modeling, such as context versus factual recall. 

    Researchers from Together AI, Stanford University, Hessian AI, RIKEN, Arc Institute, CZ Biohub, and Liquid AI investigate architecture optimization, ranging from scaling rules to artificial activities that test certain model capabilities. They introduce mechanistic architectural design (MAD), an approach for rapid architecture prototypes and testing. Selected to function as discrete unit tests for critical architecture characteristics, MAD comprises a set of synthetic activities like compression, memorization, and recall that necessitate just minutes of training time. Developing better methods for manipulating sequences, such as in-context learning and recall, has led to a better understanding of sequence models like Transformers, which has inspired MAD problems. 

    Using MAD, the team evaluates designs that use well-known and unfamiliar computational primitives, including gated convolutions, gated input-varying linear recurrences, and additional operators like mixtures of experts (MoEs). They use MAD to filter to find potential candidates for architecture. This has led to the discovery and validation of various design optimization strategies, such as striping—creating hybrid architectures by sequentially interleaving blocks made of various computational primitives with a predetermined connection topology. 

    The researchers investigate the link between MAD synthetics and real-world scaling by training 500 language models with diverse architectures and 70–7 billion parameters to conduct the broadest scaling law analysis on developing architectures. Scaling rules for compute-optimal LSTMs and Transformers are the foundation of their protocol. Overall, hybrid designs outperform their non-hybrid counterparts in scaling, reducing pretraining losses over a range of FLOP compute budgets at the compute-optimal frontier. Their work also demonstrates that novel architectures are more resilient to extensive pretraining runs outside the optimal frontier.

    The state’s size, similar to kv-caches in standard Transformers, is an important factor in MAD and its scaling analysis. It determines inference efficiency and memory cost and likely directly affects recall capabilities. The team presents a state-optimal scaling methodology to estimate the complexity scaling with the state dimension of various model designs. They discover hybrid designs that strike a good compromise between complexity, state dimension, and computing requirements.

    By combining MAD with newly developed computational primitives, they can create cutting-edge hybrid architectures that achieve 20% lower perplexity while maintaining the same computing budget as the top Transformer, convolutional, and recurrent baselines (Transformer++, Hyena, Mamba). 

    The findings of this research have significant implications for machine learning and artificial intelligence. By demonstrating that a well-chosen set of MAD simulated tasks can accurately forecast scaling law performance, the team opens the door to automated, faster architecture design. This is particularly relevant for models of the same architectural class, where MAD accuracy is closely associated with compute-optimal perplexity at scale. 

    Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 39k+ ML SubReddit

    New research on mechanistic architecture design and scaling laws.

    – We perform the largest scaling laws analysis (500+ models, up to 7B) of beyond Transformer architectures to date

    – For the first time, we show that architecture performance on a set of isolated token… pic.twitter.com/khJAXnvwWA

    — Michael Poli (@MichaelPoli6) March 28, 2024

    The post This Machine Learning Research Introduces Mechanistic Architecture Design (Mad) Pipeline: Encompassing Small-Scale Capability Unit Tests Predictive of Scaling Laws appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleUnifying Neural Network Design with Category Theory: A Comprehensive Framework for Deep Learning Architecture
    Next Article NAVER Cloud Researchers Introduce HyperCLOVA X: A Multilingual Language Model Tailored to Korean Language and Culture

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 16, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-47916 – Invision Community Themeeditor Remote Code Execution

    May 16, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    5 MacOS terminal apps that are better than the default

    Development

    How Instagram’s upcoming video editor aims to surpass TikTok’s CapCut

    News & Updates

    14 Best Free and Open Source Electronic Design Automation Tools

    Linux

    Testing Vue Components

    Development

    Highlights

    21 Best Free and Open Source DNS Servers

    March 19, 2025

    Domain Name System (DNS) is the internet’s directory service. We recommend the best free and…

    Snag this 85-inch TCL TV for just $900 this Labor Day weekend

    August 29, 2024

    5 reasons why Linux will overtake Windows and MacOS on the desktop – eventually

    July 31, 2024

    Agentic AI: The Foundations Based on Perception Layer, Knowledge Representation and Memory Systems

    January 31, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.