Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Imbue Team Trains 70B-Parameter Model From Scratch: Innovations in Pre-Training, Evaluation, and Infrastructure for Advanced AI Performance

    Imbue Team Trains 70B-Parameter Model From Scratch: Innovations in Pre-Training, Evaluation, and Infrastructure for Advanced AI Performance

    June 28, 2024

    The Imbue Team recently undertook an ambitious project to train a 70-billion-parameter language model from scratch, achieving significant milestones in model performance and evaluation methodologies. Their team focused on creating a model that outperforms GPT-4 in zero-shot scenarios across various reasoning and coding benchmarks despite being pre-trained on only 2 trillion tokens compared to the much larger datasets used by comparable models.

    The initiative addressed several critical questions about artificial intelligence and machine learning. One of the primary goals was to explore the practical requirements for building robust agents capable of writing and implementing reliable code. The team sought to understand the benefits of pre-training instead of fine-tuning or other post-training techniques. They also investigated the contributions of engineering optimizations in infrastructure, hardware, data, and evaluations towards developing a robust and accurate model.

    The Imbue Team employed a cost-aware hyperparameter optimizer known as CARBS, which was pivotal in scaling their system to 70 billion parameters with minimal training instability. CARBS allowed the team to systematically fine-tune all hyperparameters, ensuring optimal performance for models of any size. This approach was crucial in mitigating the risks associated with training large models, particularly for smaller teams experimenting with novel architectures.

    Image Source

    The project also emphasized the importance of clean evaluation datasets. The team updated and shared datasets to facilitate the accurate assessment of models on reasoning and coding tasks. This step was essential in ensuring that models achieved nearly 100% accuracy on unambiguous questions, thereby setting a high standard for evaluation. Additionally, the team released infrastructure scripts and best practices to assist other teams in training large language models efficiently, reducing the need to reproduce complex infrastructure code and knowledge from scratch.

    Notable outcomes of this project were the development of a new code-focused reasoning benchmark and a dataset of 450,000 human judgments about ambiguity. These resources are designed to help other researchers and developers build and evaluate their models more effectively. By sharing these tools and insights, the Imbue Team aims to lower the barrier to entry for large-scale model training and encourage innovation in the field.

    The team learned valuable lessons throughout the training, highlighting the importance of automated processes for diagnosing and resolving infrastructure issues, clean evaluation datasets, and resource-efficient pre-training experiments. These insights contribute to understanding how to build large, performant models that can operate reliably in real-world scenarios.

    Key highlights of the research include the following:

    The Imbue Team trained a 70-billion-parameter model, outperforming GPT-4 in zero-shot reasoning and coding benchmarks.

    The project addressed practical requirements for building robust coding agents and explored the benefits of pre-training.

     Key tools and resources developed include CARBS, a cost-aware hyperparameter optimizer, clean evaluation datasets, infrastructure scripts, and a new code-focused reasoning benchmark.

    Lessons learned emphasized the importance of clean datasets, automated infrastructure processes, and resource-efficient pre-training experiments.

    The initiative aims to decrease the barrier to entry for large-scale model training and encourages innovation in AI research.

    In conclusion, the Imbue Team’s work on this project is part of a broader effort to advance AI models’ research and development. Their focus areas include reinforcement learning, agent and reasoning architectures, data generation techniques, and user experience design. The team is committed to making these powerful capabilities accessible and intuitive for users and continues to explore new frontiers in AI research.

    The post Imbue Team Trains 70B-Parameter Model From Scratch: Innovations in Pre-Training, Evaluation, and Infrastructure for Advanced AI Performance appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleFact or Fiction? NOCHA: A New Benchmark for Evaluating Long-Context Reasoning in LLMs
    Next Article Q*: A Versatile Artificial Intelligence AI Approach to Improve LLM Performance in Reasoning Tasks

    Related Posts

    Machine Learning

    LLMs Struggle with Real Conversations: Microsoft and Salesforce Researchers Reveal a 39% Performance Drop in Multi-Turn Underspecified Tasks

    May 17, 2025
    Machine Learning

    This AI paper from DeepSeek-AI Explores How DeepSeek-V3 Delivers High-Performance Language Modeling by Minimizing Hardware Overhead and Maximizing Computational Efficiency

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Germany Shuts Down eXch Over $1.9B Laundering, Seizes €34M in Crypto and 8TB of Data

    Development

    Take It Down Act Expected to Become Law Despite Concerns

    Development

    MOS-Bench: A Comprehensive Collection of Datasets for Training and Evaluating Subjective Speech Quality Assessment (SSQA) Models

    Development

    CVE-2025-46565 – Vite File Pattern Denial of Service

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    Two years’ jail for down-on-his-luck man who sold ransomware online

    May 14, 2025

    A man has been jailed in Ireland for two years after pleading guilty to offences…

    Some personal news

    August 9, 2024

    ASUS gives its Zenbook Duo (2025) more power without harming the battery, but some undesirable quirks remain

    February 17, 2025

    How to automate medical data extraction: A quick guide

    July 12, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.