Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»â€˜Inheritune’ by UT Austin Assists Efficient Language Model Training: Leveraging Inheritance and Reduced Data for Comparable Performance

    ‘Inheritune’ by UT Austin Assists Efficient Language Model Training: Leveraging Inheritance and Reduced Data for Comparable Performance

    April 21, 2024

    Scaling up LLMs presents significant challenges due to the immense computational resources needed and the need for high-quality datasets. Typically, the pre-training process involves utilizing models with billions of parameters and training them on datasets containing trillions of tokens. This intricate procedure demands substantial computational power and access to high-quality data to achieve better performance in language understanding and generation tasks.

    Researchers from UT Austin have developed “Inheritune,” a method to distinguish smaller base LMs from larger ones. They inherit a few transformer blocks from a larger LM and then train the smaller model on a tiny fraction (0.1%) of the original pretraining data. This approach efficiently creates LMs with 1.5 billion parameters using just 1 billion tokens, leveraging a single GPU in under 12 hours. Despite using significantly less data, the resulting models perform comparably to publicly available LMs trained on larger datasets, demonstrating efficacy across various settings.

    Previous approaches to training small-base LMs involve extensive training from scratch with trillions of tokens or utilizing high-quality synthetic data. For instance, tinyllama-1B is trained from scratch with 3 trillion tokens over 90 days. In contrast, the Inheritune, efficiently trains small base LMs by inheriting transformer blocks from larger models and training on a small subset of data, achieving comparable performance with significantly fewer computational resources. While model compression techniques have been successful in other domains, such as neural networks, they have yet to be as effective in the complex functions of large LMs.

    In the Inheritune approach, a small base LM is crafted by inheriting a fraction of pre-training data and a few layers from an existing large LM. Firstly, the first n layers of the reference model are inherited, initializing the target model. Then, the target model is trained on the available subset of training data for a specified number of epochs. In the experiments, the researchers use a 1 billion token subset of the Redpajama v1 dataset to train a 1.5 billion parameter LM, achieving competitive performance compared to scratch-trained and derived LMs. The researchers evaluate the approach using various baseline models, primarily considering their pre-training data quality for fair comparison.

    Inheritance enables the extraction of smaller target LMs without sacrificing performance, showcasing comparable zero-shot performance on relevant downstream tasks. Moreover, these LMs outperform similar-sized models trained from scratch, surpassing them after fewer training steps. Experimentation with GPT2-medium models demonstrates that initialization with Inheritune, particularly with attention and MLP weights, yields superior convergence speed and final validation loss performance. Surprisingly, initializing either attention or MLP weights produces similar improvements in convergence speed and validation loss.

    Also, Limitations of the Inheritune method include its inability to modify the architectural design beyond changing the number of transformer blocks, potentially limiting flexibility in customizing hidden sizes and attention heads. Sensitivity to the quality of the training dataset is another concern due to its small size. Additionally, selecting blocks to retain, dataset curation, and hyperparameter tuning still need to explore avenues for improvement. Nevertheless, the study concludes that Inheritune effectively pre-trains small base language models with minimal data and computational resources, offering a straightforward approach to model reduction from large reference models.

    Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 40k+ ML SubReddit

    For Content Partnership, Please Fill Out This Form Here..

    The post ‘Inheritune’ by UT Austin Assists Efficient Language Model Training: Leveraging Inheritance and Reduced Data for Comparable Performance appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleMIT Researchers Use Deep Learning to Get a Better Picture of the Atmospheric Layer Closest to Earth’s Surface: Improving Weather and Drought Prediction
    Next Article 6 Free Artificial Intelligence AI Courses from Google

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 16, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-47916 – Invision Community Themeeditor Remote Code Execution

    May 16, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Verizon will sell you the Samsung Galaxy S25 Edge for free – how the deal works

    News & Updates

    I compared two of the best Roborock models on the market – and it came down to the wire

    News & Updates

    Synology Urges Patch for Critical Zero-Click RCE Flaw Affecting Millions of NAS Devices

    Development

    Avowed Keeps Crashing: How to Permanently Stop it

    Operating Systems

    Highlights

    CVE-2025-48135 – Aptivada for WP Cross-Site Scripting

    May 16, 2025

    CVE ID : CVE-2025-48135

    Published : May 16, 2025, 4:15 p.m. | 47 minutes ago

    Description : Improper Neutralization of Input During Web Page Generation (‘Cross-site Scripting’) vulnerability in aptivadadev Aptivada for WP allows DOM-Based XSS. This issue affects Aptivada for WP: from n/a through 2.0.0.

    Severity: 6.5 | MEDIUM

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    GNS3 – network simulator

    January 1, 2025

    Serpent OS diventa AerynOS: un nuovo nome per una distribuzione GNU/Linux in evoluzione

    February 16, 2025

    Amazon India Customer Raises Data Breach Concerns After Receiving Duplicate Orders, Fake Products

    August 12, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.