EURUS: A Suite of Large Language Models (LLMs) Optimized for Reasoning, Achieving State-of-the-Art Results among Open-Source Models on Diverse Benchmarks

None of us can deny that large language models (LLMs) have been pivotal in the recent advancements of Artificial Intelligence (AI). These models are instrumental in addressing a wide spectrum of tasks, from understanding natural language to solving complex mathematical problems and generating code. Their ability to reasonâ€”process information logically to solve problems, make decisions, or derive insightsâ€”is paramount. However, these models still suffer when tackling various challenging problems. These challenges are attributed but are not limited to a few primary reasons, which are (1) the deficiency of high-quality alignment data and (2) the underutilization of preference learning strategies to enhance the complicated reasoning abilities of models.

Existing work includes specialized models such as MAmmoTH-7B-Mistral and WizardMath-7B-v1.1, focused on mathematical reasoning, and Magicoder-S-DS-6.7B and OpenCodeInterpreter (OpenCI-DS-6.7B/CL-70B) for coding proficiency. Preference learning has also seen innovations with DPO and KTO methods to enhance model alignment with human preferences. However, these significant contributions often need to be revised in applying a unified reasoning capability across diverse domains, a proficiency that proprietary models like GPT-3.5 Turbo and GPT-4 demonstrate more effectively. This highlights a gap in achieving broad-based reasoning abilities within the open-source LLM landscape.

EURUS is the result of a collaborative effort by researchers from Tsinghua University, the University of Illinois Urbana-Champaign, Northeastern University, Renmin University of China, and ModelBest.Inc, BUPT, and Tencent. This collective expertise has created a collection of LLMs optimized for reasoning. EURUSâ€™s unique approach is underscored by its use of ULTRA INTERACT, a specially designed dataset that enhances reasoning through preference learning and intricate interaction models. This methodology has enabled EURUS to outperform existing models in reasoning tasks, showcasing its unique approach to tackling complex challenges.

EURUS methodology employs supervised fine-tuning and preference learning, utilizing the ULTRA INTERACT dataset. This dataset integrates preference trees with reasoning chains, multi-turn interaction trajectories, and paired actions to foster complex reasoning training. The fine-tuning process leverages foundational models Mistral-7B and CodeLlama-70B, with a performance evaluation on benchmarks like LeetCode and TheoremQA to assess reasoning across mathematical and code generation tasks. A new reward modeling objective, derived from insights gained through preference learning, enhances EURUSâ€™s decision-making accuracy, positioning it to surpass existing models in reasoning tasks.

EURUS-70B has demonstrated advanced reasoning capabilities by achieving a 33.3% pass@1 accuracy on LeetCode and 32.6% on TheoremQA. These results are significantly higher than those of existing open-source models, surpassing them by margins exceeding 13.3%. This performance across diverse benchmarks, including mathematics and code generation tasks, confirms EURUSâ€™s ability to tackle complex reasoning challenges effectively. It sets a new benchmark in the performance of LLMs for both mathematical and coding problem-solving tasks.

To conclude, the research introduced EURUS, a collection of LLMs fine-tuned for advanced reasoning tasks, utilizing the ULTRA INTERACT dataset for enhanced training. By significantly improving pass@1 accuracy on benchmarks such as LeetCode and TheoremQA, EURUS demonstrates the potential of specialized datasets and innovative training methodologies in advancing LLMsâ€™ reasoning capabilities. This work contributes to narrowing the gap between open-source models and proprietary counterparts, offering valuable insights for future AI reasoning and problem-solving developments.

Check out theÂ Paper, HF Page,Â andÂ Github.Â All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter.Â Join ourÂ Telegram Channel,Â Discord Channel, andÂ LinkedIn Group.

If you like our work, you will love ourÂ newsletter..

Donâ€™t Forget to join ourÂ 39k+ ML SubReddit

Introducing Eurus, a suite of state-of-the-art LLM reasoning generalists powered by a new member of Ultra-Series, UltraInteract!

Particularly, Eurus-70B beats GPT-3.5 Turbo in reasoning through a comprehensive benchmarking across 12 tests (mostly OOD) covering five tasks! pic.twitter.com/ijfNaY4dcU

â€” Lifan Yuan (@lifan__yuan) April 2, 2024

The post EURUS: A Suite of Large Language Models (LLMs) Optimized for Reasoning, Achieving State-of-the-Art Results among Open-Source Models on Diverse Benchmarks appeared first on MarkTechPost.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Gears of War: Reloaded — Release date, price, and everything you need to know

I’ve been using the Logitech MX Master 3S’ gaming-influenced alternative, and it could be your next mouse

Your Android devices are getting several upgrades for free – including a big one for Auto

You may qualify for Apple’s $95 million Siri settlement – how to file a claim today

YTConverter™ lets you download YouTube videos/audio cleanly via terminal — especially great for Termux users.

YTConverter™ lets you download YouTube videos/audio cleanly via terminal — especially great for Termux users.

NodeSource N|Solid Runtime Release – May 2025: Performance, Stability & the Final Update for v18

Big Changes at Meteor Software: Our Next Chapter

Gears of War: Reloaded — Release date, price, and everything you need to know

Gears of War: Reloaded — Release date, price, and everything you need to know

I’ve been using the Logitech MX Master 3S’ gaming-influenced alternative, and it could be your next mouse

How to Make Your Linux Terminal Talk Using espeak-ng

EURUS: A Suite of Large Language Models (LLMs) Optimized for Reasoning, Achieving State-of-the-Art Results among Open-Source Models on Diverse Benchmarks

February 2025 Baseline monthly digest

Markus Buehler receives 2025 Washington Award

U.S. Sanctions Chinese Cybersecurity Firm for State-Backed Hacking Campaigns

TypeScript in Laravel 12 Starter Kits: Main Things To Know

Will you be the boss of your own AI workforce?

Reimagining the Semantic Web

How AI is Transforming the World

Collective #851

Microsoft: April updates cause Windows Server auth issues

This Lenovo laptop works hard, plays hard, and is $550 off for Memorial Day

EURUS: A Suite of Large Language Models (LLMs) Optimized for Reasoning, Achieving State-of-the-Art Results among Open-Source Models on Diverse Benchmarks

Related Posts