None of us can deny that large language models (LLMs) have been pivotal in the recent advancements of Artificial Intelligence (AI). These models are instrumental in addressing a wide spectrum of tasks, from understanding natural language to solving complex mathematical problems and generating code. Their ability to reason—process information logically to solve problems, make decisions, or derive insights—is paramount. However, these models still suffer when tackling various challenging problems. These challenges are attributed but are not limited to a few primary reasons, which are (1) the deficiency of high-quality alignment data and (2) the underutilization of preference learning strategies to enhance the complicated reasoning abilities of models.
Existing work includes specialized models such as MAmmoTH-7B-Mistral and WizardMath-7B-v1.1, focused on mathematical reasoning, and Magicoder-S-DS-6.7B and OpenCodeInterpreter (OpenCI-DS-6.7B/CL-70B) for coding proficiency. Preference learning has also seen innovations with DPO and KTO methods to enhance model alignment with human preferences. However, these significant contributions often need to be revised in applying a unified reasoning capability across diverse domains, a proficiency that proprietary models like GPT-3.5 Turbo and GPT-4 demonstrate more effectively. This highlights a gap in achieving broad-based reasoning abilities within the open-source LLM landscape.
EURUS is the result of a collaborative effort by researchers from Tsinghua University, the University of Illinois Urbana-Champaign, Northeastern University, Renmin University of China, and ModelBest.Inc, BUPT, and Tencent. This collective expertise has created a collection of LLMs optimized for reasoning. EURUS’s unique approach is underscored by its use of ULTRA INTERACT, a specially designed dataset that enhances reasoning through preference learning and intricate interaction models. This methodology has enabled EURUS to outperform existing models in reasoning tasks, showcasing its unique approach to tackling complex challenges.
EURUS methodology employs supervised fine-tuning and preference learning, utilizing the ULTRA INTERACT dataset. This dataset integrates preference trees with reasoning chains, multi-turn interaction trajectories, and paired actions to foster complex reasoning training. The fine-tuning process leverages foundational models Mistral-7B and CodeLlama-70B, with a performance evaluation on benchmarks like LeetCode and TheoremQA to assess reasoning across mathematical and code generation tasks. A new reward modeling objective, derived from insights gained through preference learning, enhances EURUS’s decision-making accuracy, positioning it to surpass existing models in reasoning tasks.
EURUS-70B has demonstrated advanced reasoning capabilities by achieving a 33.3% pass@1 accuracy on LeetCode and 32.6% on TheoremQA. These results are significantly higher than those of existing open-source models, surpassing them by margins exceeding 13.3%. This performance across diverse benchmarks, including mathematics and code generation tasks, confirms EURUS’s ability to tackle complex reasoning challenges effectively. It sets a new benchmark in the performance of LLMs for both mathematical and coding problem-solving tasks.
To conclude, the research introduced EURUS, a collection of LLMs fine-tuned for advanced reasoning tasks, utilizing the ULTRA INTERACT dataset for enhanced training. By significantly improving pass@1 accuracy on benchmarks such as LeetCode and TheoremQA, EURUS demonstrates the potential of specialized datasets and innovative training methodologies in advancing LLMs’ reasoning capabilities. This work contributes to narrowing the gap between open-source models and proprietary counterparts, offering valuable insights for future AI reasoning and problem-solving developments.
Check out the Paper, HF Page, and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.
If you like our work, you will love our newsletter..
Don’t Forget to join our 39k+ ML SubReddit
The post EURUS: A Suite of Large Language Models (LLMs) Optimized for Reasoning, Achieving State-of-the-Art Results among Open-Source Models on Diverse Benchmarks appeared first on MarkTechPost.
Source: Read MoreÂ