Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      The AI productivity paradox in software engineering: Balancing efficiency and human skill retention

      July 2, 2025

      The impact of gray work on software development

      July 2, 2025

      CSS Intelligence: Speculating On The Future Of A Smarter Language

      July 2, 2025

      Hallucinated code, real threat: How slopsquatting targets AI-assisted development

      July 1, 2025

      Xbox is cancelling Rare’s ‘Everwild’ and ZeniMax’s new MMORPG IP as part of broader cuts — with ‘Perfect Dark’ impacted as well

      July 2, 2025

      Microsoft is closing down Xbox studio The Initiative, with Perfect Dark killed as well — joining Everwild and ZeniMax’s new IP, and other unannounced projects

      July 2, 2025

      No, Microsoft and Xbox’s Phil Spencer isn’t stepping down any time soon — here’s the truth

      July 2, 2025

      Everwild’s cancellation has me worried for one of my favorite dev teams and Xbox itself — It needs creative new games to thrive and refresh its identity

      July 2, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Trust but Verify: The Curious Case of AI Hallucinations

      July 2, 2025
      Recent

      Trust but Verify: The Curious Case of AI Hallucinations

      July 2, 2025

      From Flow to Fabric: Connecting Power Automate to Microsoft Fabric

      July 2, 2025

      Flutter Web Hot Reload Has Landed – No More Refreshes!

      July 2, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Xbox is cancelling Rare’s ‘Everwild’ and ZeniMax’s new MMORPG IP as part of broader cuts — with ‘Perfect Dark’ impacted as well

      July 2, 2025
      Recent

      Xbox is cancelling Rare’s ‘Everwild’ and ZeniMax’s new MMORPG IP as part of broader cuts — with ‘Perfect Dark’ impacted as well

      July 2, 2025

      Microsoft is closing down Xbox studio The Initiative, with Perfect Dark killed as well — joining Everwild and ZeniMax’s new IP, and other unannounced projects

      July 2, 2025

      No, Microsoft and Xbox’s Phil Spencer isn’t stepping down any time soon — here’s the truth

      July 2, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Meet ReSearch: A Novel AI Framework that Trains LLMs to Reason with Search via Reinforcement Learning without Using Any Supervised Data on Reasoning Steps

    Meet ReSearch: A Novel AI Framework that Trains LLMs to Reason with Search via Reinforcement Learning without Using Any Supervised Data on Reasoning Steps

    April 1, 2025

    Large language models (LLMs) have demonstrated significant progress across various tasks, particularly in reasoning capabilities. However, effectively integrating reasoning processes with external search operations remains challenging, especially for multi-hop questions requiring intricate reasoning chains and multiple retrieval steps. Current methods primarily depend on manually designed prompts or heuristics, posing limitations in scalability and flexibility. Additionally, generating supervised data for multi-step reasoning scenarios is often prohibitively expensive and practically infeasible.

    Researchers from Baichuan Inc., Tongji University, The University of Edinburgh, and Zhejiang University introduce ReSearch, a novel AI framework designed to train LLMs to integrate reasoning with search via reinforcement learning, notably without relying on supervised reasoning steps. The core methodology of ReSearch incorporates search operations directly into the reasoning chain. Utilizing Group Relative Policy Optimization (GRPO), a reinforcement learning technique, ReSearch guides LLMs to autonomously identify optimal moments and strategies for performing search operations, which subsequently influence ongoing reasoning. This approach enables models to progressively refine their reasoning and naturally facilitates advanced capabilities such as reflection and self-correction.

    From a technical perspective, ReSearch employs structured output formats by embedding specific tags—such as <think>, <search>, <result>, and <answer>—within the reasoning chain. These tags facilitate clear communication between the model and the external retrieval environment, systematically organizing generated outputs. During training, ReSearch intentionally excludes retrieval results from loss computations to prevent model bias. Reward signals guiding the reinforcement learning process are based on straightforward criteria: accuracy assessment through F1 scores and adherence to the predefined structured output format. This design encourages the autonomous development of sophisticated reasoning patterns, circumventing the need for manually annotated reasoning datasets.

    Experimental evaluation confirms the robustness of ReSearch. When assessed on multi-hop question-answering benchmarks, including HotpotQA, 2WikiMultiHopQA, MuSiQue, and Bamboogle, ReSearch consistently outperformed baseline methods. Specifically, ReSearch-Qwen-32B-Instruct achieved improvements ranging between 8.9% and 22.4% in performance compared to established baselines. Notably, these advancements were achieved despite the model being trained exclusively on a single dataset, underscoring its strong generalization capabilities. Further analyses demonstrated that models gradually increased their reliance on iterative search operations throughout training, indicative of enhanced reasoning proficiency. A detailed case study illustrated the model’s capacity to identify suboptimal search queries, reflect on its reasoning steps, and implement corrective actions autonomously.

    In summary, ReSearch presents a significant methodological advancement in training LLMs to seamlessly integrate reasoning with external search mechanisms via reinforcement learning. By eliminating dependency on supervised reasoning data, this framework effectively addresses critical scalability and adaptability issues inherent in multi-hop reasoning scenarios. Its capability for self-reflection and correction enhances its practical applicability in complex, realistic contexts. Future research directions may further extend this reinforcement learning-based framework to broader applications and incorporate additional external knowledge resources.


    Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.

    The post Meet ReSearch: A Novel AI Framework that Trains LLMs to Reason with Search via Reinforcement Learning without Using Any Supervised Data on Reasoning Steps appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleShift-Left Automation: Enhancing Software Quality with Smart Testing
    Next Article How to Use Git and Git Bash Locally: A Comprehensive Guide

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 2, 2025
    Machine Learning

    The Super Weight in Large Language Models

    July 2, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    How to Build a LangGraph and Composio-Powered Discord Bot

    Development

    How to Work Better with Git in Teams

    Linux

    CVE-2025-5170 – LliSoft MTA Maita Training System SQL Injection Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Give Your App a Voice: A Guide to Integrating AI Voice APIs

    Web Development

    Highlights

    Critical Kaleris Navis N4 Flaw (CVE-2025-2566, CVSS 9.8): Supply Chain Infrastructure at Risk!

    June 25, 2025

    Critical Kaleris Navis N4 Flaw (CVE-2025-2566, CVSS 9.8): Supply Chain Infrastructure at Risk!

    Two newly disclosed vulnerabilities in the Kaleris Navis N4 terminal operating system could allow attackers to remotely compromise container terminal infrastructure, according to a security advisory r …
    Read more

    Published Date:
    Jun 25, 2025 (3 hours, 45 minutes ago)

    Vulnerabilities has been mentioned in this article.

    CVE-2025-5087

    CVE-2025-2566

    CVE-2024-12378

    CVSS 10.0 Vulnerability Found in Ubiquity UniFi Protect Cameras

    May 9, 2025

    CVE-2025-5180 – Wondershare Filmora Local Path Injection Vulnerability

    May 26, 2025

    Critical ModSecurity WAF Vulnerability Allows Denial of Service via Empty XML Tags

    July 2, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.