Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      10 Benefits of Hiring a React.js Development Company (2025–2026 Edition)

      August 13, 2025

      From Line To Layout: How Past Experiences Shape Your Design Career

      August 13, 2025

      Hire React.js Developers in the US: How to Choose the Right Team for Your Needs

      August 13, 2025

      Google’s coding agent Jules gets critique functionality

      August 13, 2025

      The best smartphones without AI features in 2025: Expert tested and recommended

      August 13, 2025

      GPT-5 was supposed to simplify ChatGPT but now it has 4 new modes – here’s why

      August 13, 2025

      Gemini just got two of ChatGPT’s best features – and they’re free

      August 13, 2025

      The HP OmniBook 5 laptop offers 34 hours of battery life – and it’s 60% off today only

      August 13, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Laravel Boost is released

      August 13, 2025
      Recent

      Laravel Boost is released

      August 13, 2025

      Frontend Standards for Optimizely Configured Commerce: Clean & Scalable Web Best Practices

      August 13, 2025

      Live Agent Escalation in Copilot Studio Using D365 Omnichannel – Architecture and Use Case

      August 13, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      OpenAI’s Sam Altman: GPT-5 fails to meet AGI standards amid Microsoft’s fading partnership — “it’s still missing something”

      August 13, 2025
      Recent

      OpenAI’s Sam Altman: GPT-5 fails to meet AGI standards amid Microsoft’s fading partnership — “it’s still missing something”

      August 13, 2025

      You Think You Need a Monster PC to Run Local AI, Don’t You? — My Seven-Year-Old Mid-range Laptop Says Otherwise

      August 13, 2025

      8 Registry Tweaks that will Make File Explorer Faster and Easier to Use on Windows 11

      August 13, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Reinforcement Learning, Not Fine-Tuning: Nemotron-Tool-N1 Trains LLMs to Use Tools with Minimal Supervision and Maximum Generalization

    Reinforcement Learning, Not Fine-Tuning: Nemotron-Tool-N1 Trains LLMs to Use Tools with Minimal Supervision and Maximum Generalization

    May 13, 2025

    Equipping LLMs with external tools or functions has become popular, showing great performance across diverse domains. Existing research depends on synthesizing large volumes of tool-use trajectories through advanced language models and SFT to enhance LLMs’ tool-calling capability. The critical limitation lies in the synthetic datasets’ inability to capture explicit reasoning steps, resulting in superficial tool call training. In many cases, reasoning is either completely omitted during the training or deferred to inference through prompting techniques. This results in pseudo-reasoning: models merely learn to mimic surface-level patterns without truly understanding the underlying decision-making process.

    Existing research explores multiple approaches to enhance LLMs’ tool-use capabilities. Previous methods have focused on two key strategies for improving tool learning. The first approach concentrated on dataset curation and model refinement, involving the creation of large-scale supervised datasets and applying advanced training techniques such as SFT and DPO reinforcement learning. LLMs are combined with various external tools, including search engines, calculators, vision tools, and Python interpreters, to expand their functional capabilities. The second approach targeted reasoning improvement, shifting from traditional train-time scaling to more complex test-time scaling strategies. Earlier methods relied on step-level supervision and learned reward models to guide reasoning trajectories.

    Researchers from NVIDIA, Pennsylvania State University, and the University of Washington have proposed the Nemotron-Research-Tool-N1 series to address the limitations of existing tool-use methods. It diverges from traditional SFT and reasoning trace distillation techniques by implementing a unique RL paradigm. Drawing inspiration from DeepSeek-R1’s success, a lightweight supervision method has been developed to focus on the structural validity and functional correctness evaluation of tool invocations. The Nemotron-Research-Tool-N1 model employs a binary reward mechanism that enables the model to autonomously develop reasoning strategies without relying on explicitly annotated reasoning trajectories.

    Researchers unify and preprocess data from existing tool-calling datasets, xLAM, and a subset of ToolACE, which provide single-turn and multi-turn synthetic tool-calling trajectories. A lightweight prompting template is created to guide tool call generation, featuring explicit instructions for intermediate reasoning within <think>…</think> tags and tool invocation enclosed in <tool_call>…</tool_call>. The template helps to minimize rigid formatting constraints and reduce the risk of overfitting to specific prompt patterns. The primary backbone model utilized is Qwen2.5-7B/14B-Instruct, and to evaluate the generalization ability of the proposed method, evaluations are performed on alternative backbone models, including multiple variants from the LLaMA family.

    Results on the BFCL and API-Bank benchmarks show Nemotron-Research-Tool-N1 models’ superior performance. On the BFCL benchmark, the Tool-N1-7B/14B models outperform closed-source models like GPT-4o and specialized fine-tuned models such as xLAM-2-70B and ToolACE-8B. The models surpass SFT baselines trained on identical data sources, highlighting the effectiveness of the R1-style RL approach. Further, the API-Bank benchmark validates these findings, with Tool-N1-7B/14B achieving 4.12% and 5.03% higher accuracy than GPT-4o. These results conclusively demonstrate the potential of the proposed method in enhancing large language models’ tool-calling capabilities through a novel reinforcement learning paradigm.

    In conclusion, researchers introduced Nemotron-Research-Tool-N1, a significant advancement in LLM tool-use capabilities. The research shows a paradigm shift from traditional SFT methodologies by introducing a novel rule-based RL approach. The proposed method enables models to develop sophisticated reasoning strategies without relying on explicitly annotated reasoning trajectories. Benchmark evaluations across BFCL and API-Bank consistently validate the approach’s effectiveness, showing substantial performance improvements over existing baselines. The findings open new avenues for developing more adaptable and intelligent language models that can autonomously generate reasoning strategies.


    Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 90k+ ML SubReddit.

    Here’s a brief overview of what we’re building at Marktechpost:

    • ML News Community – r/machinelearningnews (92k+ members)
    • Newsletter– airesearchinsights.com/(30k+ subscribers)
    • miniCON AI Events – minicon.marktechpost.com
    • AI Reports & Magazines – magazine.marktechpost.com
    • AI Dev & Research News – marktechpost.com (1M+ monthly readers)
    • Partner with us

    The post Reinforcement Learning, Not Fine-Tuning: Nemotron-Tool-N1 Trains LLMs to Use Tools with Minimal Supervision and Maximum Generalization appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticlePwC Releases Executive Guide on Agentic AI: A Strategic Blueprint for Deploying Autonomous Multi-Agent Systems in the Enterprise
    Next Article A Step-by-Step Guide to Deploy a Fully Integrated Firecrawl-Powered MCP Server on Claude Desktop with Smithery and VeryaX

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    August 13, 2025
    Machine Learning

    Nebius AI Advances Open-Weight LLMs Through Reinforcement Learning for Capable SWE Agents

    August 13, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    South African man imprisoned after ransom demand against his former employer

    Development

    Secret Diary of a Billionaire AI Bot

    Artificial Intelligence

    “Fear not—we are cooking!” Helldivers 2 devs say there’s “exciting news to come” and a new Warbond in May as we defeat the Illuminate, which surely means it’s about to invade for real and kill us all

    News & Updates

    CVE-2025-46750 – SELogic BIOS Password Bypass Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    News & Updates

    Save $400 on the best Samsung TVs, laptops, tablets, and more when you sign up for Verizon 5G Home or Home Internet

    May 17, 2025

    You can get $400 off Samsung gear like Galaxy laptops and OLED TVs when you…

    I just found the cheapest laptop worth actually buying in this anti-Prime day deal — under $300 and not absolute garbage

    July 10, 2025

    CVE-2025-52377 – Nexxt Solutions NCM-X1800 Mesh Router Command Injection Vulnerability

    July 15, 2025

    Malicious PyPI Package Posing as Solana Tool Stole Source Code in 761 Downloads

    May 14, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.