Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Google DeepMind Researchers Propose a Novel Divide-and-Conquer Style Monte Carlo Tree Search (MCTS) Algorithm ‘OmegaPRM’ for Efficiently Collecting High-Quality Process Supervision Data

    Google DeepMind Researchers Propose a Novel Divide-and-Conquer Style Monte Carlo Tree Search (MCTS) Algorithm ‘OmegaPRM’ for Efficiently Collecting High-Quality Process Supervision Data

    June 16, 2024

    Artificial intelligence (AI) focuses on creating systems capable of performing tasks requiring human intelligence. Within this field, the development of large language models (LLMs) aims to understand and generate human language, with applications in translation, summarization, and question-answering. Despite these advancements, complex multi-step reasoning tasks, such as solving mathematical problems, still need to be solved for even the most advanced LLMs. Enhancing the reasoning capabilities of these models is crucial for improving their performance on such tasks.

    A significant problem in AI is improving the reasoning abilities of LLMs, especially for tasks requiring multiple logical steps. Current models often make intermediate-step errors, leading to incorrect final answers. Addressing these errors in the intermediate stages is essential for better performance in complex reasoning tasks. The focus is on creating methods that can more accurately guide LLMs through each step of the reasoning process.

    Existing research includes various frameworks and models to improve LLM reasoning capabilities. Chain-of-Thought (CoT) prompting guides LLMs to break down tasks into intermediate steps, enhancing performance. Outcome Reward Models (ORMs) and Process Reward Models (PRMs) provide feedback, with PRMs offering more detailed supervision at each step. Current methods like Math-Shepherd and MiPS use Monte Carlo estimation to automate data collection, while self-consistency decoding and fine-tuning with high-quality datasets have also improved LLM reasoning.

    Researchers at Google DeepMind and Google introduced OmegaPRM, a novel method for automated process supervision data collection. This method employs a divide-and-conquer Monte Carlo Tree Search (MCTS) algorithm to efficiently identify the first error in a reasoning chain. OmegaPRM uses binary search to balance the collection of positive and negative examples, ensuring high quality and efficiency. This automated approach distinguishes itself by eliminating the need for costly human intervention, thus making it a scalable solution for enhancing LLM performance.

    The OmegaPRM methodology involves creating a state-action tree to represent detailed reasoning paths for questions. Nodes contain the question and preceding reasoning steps, while edges indicate subsequent steps. The algorithm uses temperature sampling to generate multiple completions, treated as an approximate action space. The researchers collected over 1.5 million process supervision annotations from the MATH dataset. The Gemini Pro model, trained with this data, utilized the weighted self-consistency algorithm to achieve improved performance, demonstrating the effectiveness of OmegaPRM in training PRMs.

    The OmegaPRM algorithm enhances the instruction-tuned Gemini Pro model’s mathematical reasoning performance. Utilizing the weighted self-consistency algorithm alongside automated process supervision, the model achieved a 69.4% success rate on the MATH benchmark. This success rate represents a 36% relative improvement from the base model’s 51% performance. The researchers’ automated approach ensures that data collection costs are significantly reduced compared to human annotation and brute-force Monte Carlo sampling methods. These improvements underscore the potential of OmegaPRM in advancing LLM capabilities in complex multi-step reasoning tasks.

    In conclusion, the research team at Google DeepMind and Google successfully tackled the challenge of improving LLM mathematical reasoning through automated process supervision. The OmegaPRM method enhances performance and reduces reliance on costly human annotation, making it a significant advancement in AI reasoning tasks. The methodology’s efficiency and the model’s improved performance underscore OmegaPRM’s potential to revolutionize complex multi-step reasoning in AI.

    language processing tasks.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. 

    Join our Telegram Channel and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 44k+ ML SubReddit

    The post Google DeepMind Researchers Propose a Novel Divide-and-Conquer Style Monte Carlo Tree Search (MCTS) Algorithm ‘OmegaPRM’ for Efficiently Collecting High-Quality Process Supervision Data appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleOpenVLA: A 7B-Parameter Open-Source VLA Setting New State-of-the-Art for Robot Manipulation Policies
    Next Article This AI Paper from China Proposes Continuity-Relativity indExing with gAussian Middle (CREAM): A Simple yet Effective AI Method to Extend the Context of Large Language Models

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 16, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-47916 – Invision Community Themeeditor Remote Code Execution

    May 16, 2025
    Leave A Reply Cancel Reply

    Hostinger

    Continue Reading

    CMU Researchers Propose MOMENT: A Family of Open-Source Machine Learning Foundation Models for General-Purpose Time Series Analysis

    Development

    CNCF Triggers a Platform Parity Breakthrough for Arm64 and x86

    Development

    Overcoming User Access Challenges in UX Research

    Development

    Beyond the Hills: A Dreamer’s Journey

    Artificial Intelligence

    Highlights

    Development

    Rilasciata Kali Linux 2024.4: Python 3.12, supporto Raspberry Pi Imager e addio i386

    December 17, 2024

    Il team di Kali Linux ha recentemente rilasciato la sua nuova versione Kali Linux 2024.4, segnando…

    PrettyInsights just launched a google analytics alternative

    May 11, 2025

    Happy Global Accessibility Awareness Day: Why It’s Everyone’s Celebration

    May 16, 2024

    AI-Powered Fake News Campaign Targets Western Support for Ukraine and U.S. Elections

    November 29, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.