Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 18, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 18, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 18, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 18, 2025

      Gears of War: Reloaded — Release date, price, and everything you need to know

      May 18, 2025

      I’ve been using the Logitech MX Master 3S’ gaming-influenced alternative, and it could be your next mouse

      May 18, 2025

      Your Android devices are getting several upgrades for free – including a big one for Auto

      May 18, 2025

      You may qualify for Apple’s $95 million Siri settlement – how to file a claim today

      May 18, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      YTConverter™ lets you download YouTube videos/audio cleanly via terminal — especially great for Termux users.

      May 18, 2025
      Recent

      YTConverter™ lets you download YouTube videos/audio cleanly via terminal — especially great for Termux users.

      May 18, 2025

      NodeSource N|Solid Runtime Release – May 2025: Performance, Stability & the Final Update for v18

      May 17, 2025

      Big Changes at Meteor Software: Our Next Chapter

      May 17, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Gears of War: Reloaded — Release date, price, and everything you need to know

      May 18, 2025
      Recent

      Gears of War: Reloaded — Release date, price, and everything you need to know

      May 18, 2025

      I’ve been using the Logitech MX Master 3S’ gaming-influenced alternative, and it could be your next mouse

      May 18, 2025

      How to Make Your Linux Terminal Talk Using espeak-ng

      May 18, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»EURUS: A Suite of Large Language Models (LLMs) Optimized for Reasoning, Achieving State-of-the-Art Results among Open-Source Models on Diverse Benchmarks

    EURUS: A Suite of Large Language Models (LLMs) Optimized for Reasoning, Achieving State-of-the-Art Results among Open-Source Models on Diverse Benchmarks

    April 5, 2024

    None of us can deny that large language models (LLMs) have been pivotal in the recent advancements of Artificial Intelligence (AI). These models are instrumental in addressing a wide spectrum of tasks, from understanding natural language to solving complex mathematical problems and generating code. Their ability to reason—process information logically to solve problems, make decisions, or derive insights—is paramount. However, these models still suffer when tackling various challenging problems. These challenges are attributed but are not limited to a few primary reasons, which are (1) the deficiency of high-quality alignment data and (2) the underutilization of preference learning strategies to enhance the complicated reasoning abilities of models.

    Existing work includes specialized models such as MAmmoTH-7B-Mistral and WizardMath-7B-v1.1, focused on mathematical reasoning, and Magicoder-S-DS-6.7B and OpenCodeInterpreter (OpenCI-DS-6.7B/CL-70B) for coding proficiency. Preference learning has also seen innovations with DPO and KTO methods to enhance model alignment with human preferences. However, these significant contributions often need to be revised in applying a unified reasoning capability across diverse domains, a proficiency that proprietary models like GPT-3.5 Turbo and GPT-4 demonstrate more effectively. This highlights a gap in achieving broad-based reasoning abilities within the open-source LLM landscape.

    EURUS is the result of a collaborative effort by researchers from Tsinghua University, the University of Illinois Urbana-Champaign, Northeastern University, Renmin University of China, and ModelBest.Inc, BUPT, and Tencent. This collective expertise has created a collection of LLMs optimized for reasoning. EURUS’s unique approach is underscored by its use of ULTRA INTERACT, a specially designed dataset that enhances reasoning through preference learning and intricate interaction models. This methodology has enabled EURUS to outperform existing models in reasoning tasks, showcasing its unique approach to tackling complex challenges.

    EURUS methodology employs supervised fine-tuning and preference learning, utilizing the ULTRA INTERACT dataset. This dataset integrates preference trees with reasoning chains, multi-turn interaction trajectories, and paired actions to foster complex reasoning training. The fine-tuning process leverages foundational models Mistral-7B and CodeLlama-70B, with a performance evaluation on benchmarks like LeetCode and TheoremQA to assess reasoning across mathematical and code generation tasks. A new reward modeling objective, derived from insights gained through preference learning, enhances EURUS’s decision-making accuracy, positioning it to surpass existing models in reasoning tasks.

    EURUS-70B has demonstrated advanced reasoning capabilities by achieving a 33.3% pass@1 accuracy on LeetCode and 32.6% on TheoremQA. These results are significantly higher than those of existing open-source models, surpassing them by margins exceeding 13.3%. This performance across diverse benchmarks, including mathematics and code generation tasks, confirms EURUS’s ability to tackle complex reasoning challenges effectively. It sets a new benchmark in the performance of LLMs for both mathematical and coding problem-solving tasks.

    To conclude, the research introduced EURUS, a collection of LLMs fine-tuned for advanced reasoning tasks, utilizing the ULTRA INTERACT dataset for enhanced training. By significantly improving pass@1 accuracy on benchmarks such as LeetCode and TheoremQA, EURUS demonstrates the potential of specialized datasets and innovative training methodologies in advancing LLMs’ reasoning capabilities. This work contributes to narrowing the gap between open-source models and proprietary counterparts, offering valuable insights for future AI reasoning and problem-solving developments.

    Check out the Paper, HF Page, and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 39k+ ML SubReddit

    Introducing Eurus, a suite of state-of-the-art LLM reasoning generalists powered by a new member of Ultra-Series, UltraInteract!

    Particularly, Eurus-70B beats GPT-3.5 Turbo in reasoning through a comprehensive benchmarking across 12 tests (mostly OOD) covering five tasks! pic.twitter.com/ijfNaY4dcU

    — Lifan Yuan (@lifan__yuan) April 2, 2024

    The post EURUS: A Suite of Large Language Models (LLMs) Optimized for Reasoning, Achieving State-of-the-Art Results among Open-Source Models on Diverse Benchmarks appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleMeet Atla: A Machine Learning Startup Building an AI Evaluation Model to Unlock the Full Potential of Language Models for Developers
    Next Article Researchers from ETH Zurich, EPFL, and Microsoft Introduce QuaRot: A Machine Learning Method that Enables 4-bit Inference of LLMs by Removing the Outlier Features

    Related Posts

    Development

    February 2025 Baseline monthly digest

    May 18, 2025
    Artificial Intelligence

    Markus Buehler receives 2025 Washington Award

    May 18, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    U.S. Sanctions Chinese Cybersecurity Firm for State-Backed Hacking Campaigns

    Development

    TypeScript in Laravel 12 Starter Kits: Main Things To Know

    Development

    Will you be the boss of your own AI workforce?

    Artificial Intelligence

    Reimagining the Semantic Web

    Development
    Hostinger

    Highlights

    How AI is Transforming the World

    May 14, 2025

    Post Content Source: Read More 

    Collective #851

    June 28, 2024

    Microsoft: April updates cause Windows Server auth issues

    May 7, 2025

    This Lenovo laptop works hard, plays hard, and is $550 off for Memorial Day

    May 26, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.