Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      The AI productivity paradox in software engineering: Balancing efficiency and human skill retention

      July 2, 2025

      The impact of gray work on software development

      July 2, 2025

      CSS Intelligence: Speculating On The Future Of A Smarter Language

      July 2, 2025

      Hallucinated code, real threat: How slopsquatting targets AI-assisted development

      July 1, 2025

      Xbox is cancelling Rare’s ‘Everwild’ and ZeniMax’s new MMORPG IP as part of broader cuts — with ‘Perfect Dark’ impacted as well

      July 2, 2025

      Microsoft is closing down Xbox studio The Initiative, with Perfect Dark killed as well — joining Everwild and ZeniMax’s new IP, and other unannounced projects

      July 2, 2025

      No, Microsoft and Xbox’s Phil Spencer isn’t stepping down any time soon — here’s the truth

      July 2, 2025

      Everwild’s cancellation has me worried for one of my favorite dev teams and Xbox itself — It needs creative new games to thrive and refresh its identity

      July 2, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Trust but Verify: The Curious Case of AI Hallucinations

      July 2, 2025
      Recent

      Trust but Verify: The Curious Case of AI Hallucinations

      July 2, 2025

      From Flow to Fabric: Connecting Power Automate to Microsoft Fabric

      July 2, 2025

      Flutter Web Hot Reload Has Landed – No More Refreshes!

      July 2, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Xbox is cancelling Rare’s ‘Everwild’ and ZeniMax’s new MMORPG IP as part of broader cuts — with ‘Perfect Dark’ impacted as well

      July 2, 2025
      Recent

      Xbox is cancelling Rare’s ‘Everwild’ and ZeniMax’s new MMORPG IP as part of broader cuts — with ‘Perfect Dark’ impacted as well

      July 2, 2025

      Microsoft is closing down Xbox studio The Initiative, with Perfect Dark killed as well — joining Everwild and ZeniMax’s new IP, and other unannounced projects

      July 2, 2025

      No, Microsoft and Xbox’s Phil Spencer isn’t stepping down any time soon — here’s the truth

      July 2, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Meet ReSearch: A Novel AI Framework that Trains LLMs to Reason with Search via Reinforcement Learning without Using Any Supervised Data on Reasoning Steps

    Meet ReSearch: A Novel AI Framework that Trains LLMs to Reason with Search via Reinforcement Learning without Using Any Supervised Data on Reasoning Steps

    April 1, 2025

    Large language models (LLMs) have demonstrated significant progress across various tasks, particularly in reasoning capabilities. However, effectively integrating reasoning processes with external search operations remains challenging, especially for multi-hop questions requiring intricate reasoning chains and multiple retrieval steps. Current methods primarily depend on manually designed prompts or heuristics, posing limitations in scalability and flexibility. Additionally, generating supervised data for multi-step reasoning scenarios is often prohibitively expensive and practically infeasible.

    Researchers from Baichuan Inc., Tongji University, The University of Edinburgh, and Zhejiang University introduce ReSearch, a novel AI framework designed to train LLMs to integrate reasoning with search via reinforcement learning, notably without relying on supervised reasoning steps. The core methodology of ReSearch incorporates search operations directly into the reasoning chain. Utilizing Group Relative Policy Optimization (GRPO), a reinforcement learning technique, ReSearch guides LLMs to autonomously identify optimal moments and strategies for performing search operations, which subsequently influence ongoing reasoning. This approach enables models to progressively refine their reasoning and naturally facilitates advanced capabilities such as reflection and self-correction.

    From a technical perspective, ReSearch employs structured output formats by embedding specific tags—such as <think>, <search>, <result>, and <answer>—within the reasoning chain. These tags facilitate clear communication between the model and the external retrieval environment, systematically organizing generated outputs. During training, ReSearch intentionally excludes retrieval results from loss computations to prevent model bias. Reward signals guiding the reinforcement learning process are based on straightforward criteria: accuracy assessment through F1 scores and adherence to the predefined structured output format. This design encourages the autonomous development of sophisticated reasoning patterns, circumventing the need for manually annotated reasoning datasets.

    Experimental evaluation confirms the robustness of ReSearch. When assessed on multi-hop question-answering benchmarks, including HotpotQA, 2WikiMultiHopQA, MuSiQue, and Bamboogle, ReSearch consistently outperformed baseline methods. Specifically, ReSearch-Qwen-32B-Instruct achieved improvements ranging between 8.9% and 22.4% in performance compared to established baselines. Notably, these advancements were achieved despite the model being trained exclusively on a single dataset, underscoring its strong generalization capabilities. Further analyses demonstrated that models gradually increased their reliance on iterative search operations throughout training, indicative of enhanced reasoning proficiency. A detailed case study illustrated the model’s capacity to identify suboptimal search queries, reflect on its reasoning steps, and implement corrective actions autonomously.

    In summary, ReSearch presents a significant methodological advancement in training LLMs to seamlessly integrate reasoning with external search mechanisms via reinforcement learning. By eliminating dependency on supervised reasoning data, this framework effectively addresses critical scalability and adaptability issues inherent in multi-hop reasoning scenarios. Its capability for self-reflection and correction enhances its practical applicability in complex, realistic contexts. Future research directions may further extend this reinforcement learning-based framework to broader applications and incorporate additional external knowledge resources.


    Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.

    The post Meet ReSearch: A Novel AI Framework that Trains LLMs to Reason with Search via Reinforcement Learning without Using Any Supervised Data on Reasoning Steps appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleShift-Left Automation: Enhancing Software Quality with Smart Testing
    Next Article How to Use Git and Git Bash Locally: A Comprehensive Guide

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 2, 2025
    Machine Learning

    The Super Weight in Large Language Models

    July 2, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Connect Amazon Bedrock Agents with Amazon Aurora PostgreSQL using Amazon RDS Data API

    Databases

    Verdansk is coming back to Call of Duty: Warzone, and I’m pretty sure I know which weapon EVERYONE will be putting in their loadouts

    News & Updates

    Engineering Smarter Data Pipelines with Autonomous AI

    Development

    CVE-2025-31926 – LambertGroup Sticky Radio Player SQL Injection Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    Mapping the misuse of generative AI

    May 29, 2025

    New research analyzes the misuse of multimodal generative AI today, in order to help build…

    Google DeepMind at NeurIPS 2023

    May 29, 2025

    3 Questions: How to help students recognize potential bias in their AI datasets

    June 2, 2025

    3 productivity gadgets I can’t work without (and why they make such a big difference)

    June 23, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.