Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Mirantis reveals Lens Prism, an AI copilot for operating Kubernetes clusters

      July 3, 2025

      Avoid these common platform engineering mistakes

      July 3, 2025

      Full-Stack Techies vs Toptal: Which Is Better for React.js Outsourcing?

      July 3, 2025

      The AI productivity paradox in software engineering: Balancing efficiency and human skill retention

      July 2, 2025

      Microsoft Gaming studios head Matt Booty says “overall portfolio strategy is unchanged” — with more than 40 games in production

      July 3, 2025

      Capcom reports that its Steam game sales have risen massively — despite flagship titles like Monster Hunter Wilds receiving profuse backlash from PC players

      July 3, 2025

      Cloudflare is fighting to safeguard “the future of the web itself” — standing directly in the way of leading AI firms

      July 3, 2025

      Microsoft reportedly lacks the know-how to fully leverage OpenAI’s tech — despite holding IP rights

      July 3, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      PHP 8.5.0 Alpha 1 available for testing

      July 3, 2025
      Recent

      PHP 8.5.0 Alpha 1 available for testing

      July 3, 2025

      Recording cross browser compatible media

      July 3, 2025

      Celebrating Perficient’s Third Databricks Champion

      July 3, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft Gaming studios head Matt Booty says “overall portfolio strategy is unchanged” — with more than 40 games in production

      July 3, 2025
      Recent

      Microsoft Gaming studios head Matt Booty says “overall portfolio strategy is unchanged” — with more than 40 games in production

      July 3, 2025

      Capcom reports that its Steam game sales have risen massively — despite flagship titles like Monster Hunter Wilds receiving profuse backlash from PC players

      July 3, 2025

      Cloudflare is fighting to safeguard “the future of the web itself” — standing directly in the way of leading AI firms

      July 3, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Reinforcement Learning Makes LLMs Search-Savvy: Ant Group Researchers Introduce SEM to Optimize Tool Usage and Reasoning Efficiency

    Reinforcement Learning Makes LLMs Search-Savvy: Ant Group Researchers Introduce SEM to Optimize Tool Usage and Reasoning Efficiency

    May 19, 2025

    Recent progress in LLMs has shown their potential in performing complex reasoning tasks and effectively using external tools like search engines. Despite this, teaching models to make smart decisions about when to rely on internal knowledge versus search remains a key challenge. While simple prompt-based methods can guide models to invoke tools, LLMs still struggle with more nuanced behaviors, such as recognizing when an initial search was incorrect and deciding to search again. RL has been explored to improve these behaviors by rewarding effective search usage. However, RL often leads to unnecessary tool use, with models executing redundant searches even for simple tasks, highlighting inefficiencies that must be addressed.

    Various RL strategies, including Proximal Policy Optimization (PPO), Direct Preference Optimization (DPO), and Group Relative Policy Optimization (GRPO), have been used to align LLM behavior with human expectations. PPO helps balance learning exploration with maintaining policy stability, while DPO simplifies alignment by directly optimizing model responses based on user preferences. GRPO introduces group-based evaluations to capture subtle improvements in reasoning better. Meanwhile, treating LLMs as autonomous agents that plan and execute multi-step reasoning tasks is gaining traction. Frameworks like AutoGPT and LangChain showcase how these agents can refine their outputs through iterative reasoning and search. Yet, current agent systems often depend on fixed prompts or heuristic-based tool use, limiting their adaptability and efficiency. 

    Researchers at Ant Group introduce SEM, a post-training reinforcement learning framework designed to teach LLMs when to use search tools and when to rely on internal knowledge. By training on a balanced dataset combining questions that do and do not require external retrieval, SEM guides the model to issue search requests only when necessary. Using a structured reasoning format and GRPO, the framework rewards accurate answers without search and penalizes unnecessary tool use. Results show that SEM improves response accuracy and efficiency, helping models better judge when external information is needed, thus enhancing reasoning in complex scenarios. 

    To integrate search tools into a model’s reasoning process, SEM uses reinforcement learning to teach models when and how to use search effectively. The training data combines Musique (questions needing external info) and MMLU (questions answerable from prior knowledge), helping models learn to judge when search is necessary. Using the GRPO framework, the model is rewarded for accurate, efficient answers, discouraging unnecessary searches, and encouraging them when internal knowledge falls short. A structured response format (<think>, <answer>, <search>, <result>) standardizes training and allows for precise reward assignment, improving both reasoning quality and search decision-making. 

    The study evaluates a model trained to determine when to rely on its internal knowledge and when to use external search. It combines Musique (unfamiliar questions) and MMLU (familiar questions) for training and evaluates performance on datasets like HotpotQA, GSM8K, and MMLU. The proposed SEM method outperforms baselines like Naive RAG and ReSearch in answer accuracy and search efficiency. SEM reduces unnecessary searches on known questions while improving reasoning on unknown ones. Case studies and training curves confirm SEM’s stable learning and intelligent decision-making. Overall, SEM enhances retrieval decisions and internal reasoning in large language models. 

    In conclusion, SEM is a post-training reinforcement learning framework designed to improve how large language models use external search tools. The model is trained on a dataset combining MuSiQue and MMLU, helping it distinguish between questions it can answer internally and those that require external retrieval. SEM uses a structured reasoning approach and a reward function that penalizes unnecessary searches while promoting accurate and efficient retrieval. Experiments on benchmarks like HotpotQA, GSM8K, and MMLU show that SEM reduces redundant searches and improves accuracy. This approach enhances reasoning efficiency and intelligent use of external knowledge in LLMs. 


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit.

    The post Reinforcement Learning Makes LLMs Search-Savvy: Ant Group Researchers Introduce SEM to Optimize Tool Usage and Reasoning Efficiency appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleCritical Security Vulnerabilities in the Model Context Protocol (MCP): How Malicious Tools and Deceptive Contexts Exploit AI Agents
    Next Article LLMs Struggle to Act on What They Know: Google DeepMind Researchers Use Reinforcement Learning Fine-Tuning to Bridge the Knowing-Doing Gap

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 3, 2025
    Machine Learning

    End-to-End model training and deployment with Amazon SageMaker Unified Studio

    July 3, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Flutter Web Hot Reload Has Landed – No More Refreshes!

    Development

    CVE-2025-43851 – Adobe Retrieval-based-Voice-Conversion-WebUI Remote Code Execution Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Best Chest Doctor Near Me | Parthiv Lung Care

    Web Development

    CVE-2025-5602 – Campcodes Hospital Management System SQL Injection Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    CVE-2025-46728: cpp-httplib Vulnerability Exposes Servers to Denial of Service

    May 7, 2025

    CVE-2025-46728: cpp-httplib Vulnerability Exposes Servers to Denial of Service

    The cpp-httplib, a C++11 single-file header-only cross-platform HTTP/HTTPS library known for its ease of setup, is facing a serious security vulnerability. A recently identified flaw, tracked as CVE-2 …
    Read more

    Published Date:
    May 07, 2025 (3 hours, 20 minutes ago)

    Vulnerabilities has been mentioned in this article.

    CVE-2025-46728

    CVE-2025-47241

    CVE-2024-12254

    CVE-2024-11120

    CVE-2024-6047

    Gradia lets you quickly edit images for social media

    May 30, 2025

    An iPad can’t run macOS, but it can run… Windows 11?

    April 22, 2025
    Laravel Toaster Magic

    Laravel Toaster Magic

    April 21, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.