Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»This AI Paper Introduces ReasonEval: A New Machine Learning Method to Evaluate Mathematical Reasoning Beyond Accuracy

    This AI Paper Introduces ReasonEval: A New Machine Learning Method to Evaluate Mathematical Reasoning Beyond Accuracy

    April 10, 2024

    Mathematical reasoning is vital for problem-solving and decision-making, particularly in large language models (LLMs). Evaluating LLMs’ mathematical reasoning usually focuses on the final result rather than the reasoning process intricacies. Current methodologies, like the OpenLLM leaderboard, primarily use overall accuracy, potentially overlooking logical errors or inefficient steps. Enhanced evaluation approaches are necessary to uncover underlying issues and improve LLMs’ reasoning.

    Existing approaches typically evaluate mathematical reasoning in LLMs by comparing final answers with ground truth and computing overall accuracy. However, some methods assess reasoning quality by comparing generated solution steps with reference ones. Despite datasets providing ground truth, diverse reasoning paths to the same answer challenge reliance on any single reference. Prompting-based methods directly ask LLMs, often GPT-4, to judge generated solutions, but their high computational cost and transparency issues hinder the practicality of iterative model development.

    Researchers from Shanghai Jiao Tong University, Shanghai Artificial Intelligence Laboratory, Yale University, Carnegie Mellon University, and Generative AI Research Lab (GAIR) introduced REASONEVAL, a new approach to evaluating reasoning quality beyond final-answer accuracy. It utilizes validity and redundancy metrics to characterize reasoning steps’ quality, which is automatically assessed by accompanying LLMs. REASONEVAL relies on base models with robust mathematical knowledge, trained on high-quality labeled data, to instantiate its evaluation framework.

    REASONEVAL focuses on multi-step reasoning tasks, assessing the quality of reasoning beyond final-answer accuracy. It evaluates each reasoning step for validity and redundancy, categorizing them into positive, neutral, or negative labels. Step-level scores are computed based on validity and redundancy and then aggregated to generate solution-level scores. The method utilizes various LLMs with different base models, sizes, and training strategies. Training data is sourced from PRM800K, a dataset of labeled step-by-step solutions collected by human annotators.

    REASONEVAL achieves state-of-the-art performance on human-labeled datasets and can accurately detect different errors generated by perturbation. It reveals that enhanced final-answer accuracy doesn’t consistently improve the quality of reasoning steps for complex mathematical problems. The method’s assessment also aids in data selection. Observations highlight significant decreases in validity scores for logical and calculation errors, while redundancy scores remain stable. REASONEVAL distinguishes between errors affecting validity and those introducing redundancy.

    In conclusion, the research introduces REASONEVAL, an effective metric for assessing reasoning step quality based on correctness and efficiency. Experimentation confirms its ability to identify diverse errors and competitive performance compared to existing methods. REASONEVAL exposes inconsistencies between final-answer accuracy and reasoning step quality while also proving effective in data selection for training.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 40k+ ML SubReddit

    The post This AI Paper Introduces ReasonEval: A New Machine Learning Method to Evaluate Mathematical Reasoning Beyond Accuracy appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleHuggingFace Releases Parler-TTS: An Inference and Training Library for High-Quality, Controllable Text-to-Speech (TTS) Models
    Next Article Researchers at the University of Cambridge Propose AnchorAL: A Unique Machine Learning Method for Active Learning in Unbalanced Classification Tasks

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 17, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-4831 – TOTOLINK HTTP POST Request Handler Buffer Overflow Vulnerability

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Deleting Linux Entry from Boot Menu from Windows After Removing Linux

    Linux

    The Haunted Theatre

    Artificial Intelligence

    AI achieves silver-medal standard solving International Mathematical Olympiad problems

    Artificial Intelligence

    Tesla’s Ultra-Wideband Still Vulnerable to Relay Attacks Despite Upgrades

    Development

    Highlights

    NVIDIA_OC overclocks NVIDIA GPUs

    May 3, 2025

    NVIDIA_OC is a simple Rust Command-Line Interface tool designed to overclock NVIDIA GPUs on Linux.…

    FBI Warns of RansomHub: Over 200 Victims Targeted

    August 30, 2024

    SentinelOne Uncovers Chinese Espionage Campaign Targeting Its Infrastructure and Clients

    April 29, 2025

    CVE-2025-46753 – Cisco Webex Meeting Server Authentication Bypass

    April 29, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.