Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      The Power Of The Intl API: A Definitive Guide To Browser-Native Internationalization

      August 8, 2025

      This week in AI dev tools: GPT-5, Claude Opus 4.1, and more (August 8, 2025)

      August 8, 2025

      Elastic simplifies log analytics for SREs and developers with launch of Log Essentials

      August 7, 2025

      OpenAI launches GPT-5

      August 7, 2025

      I compared the best headphones from Apple, Sony, Bose, and Sonos: Here’s how the AirPods Max wins

      August 10, 2025

      I changed these 6 settings on my iPad to significantly improve its battery life

      August 10, 2025

      DistroWatch Weekly, Issue 1134

      August 10, 2025

      3 portable power stations I travel everywhere with (and how they differ)

      August 9, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Next.js PWA offline capability with Service Worker, no extra package

      August 10, 2025
      Recent

      Next.js PWA offline capability with Service Worker, no extra package

      August 10, 2025

      spatie/laravel-flare

      August 9, 2025

      Establishing Consistent Data Foundations with Laravel’s Database Population System

      August 8, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Windows 11 Copilot gets free access to GPT-5 Thinking, reduced rate limits than ChatGPT Free

      August 10, 2025
      Recent

      Windows 11 Copilot gets free access to GPT-5 Thinking, reduced rate limits than ChatGPT Free

      August 10, 2025

      Best Architecture AI Rendering Platform: 6 Tools Tested

      August 10, 2025

      Microsoft won’t kill off Chromium Edge and PWAs on Windows 10 until October 2028

      August 10, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Microsoft Researchers Introduce ARTIST: A Reinforcement Learning Framework That Equips LLMs with Agentic Reasoning and Dynamic Tool Use

    Microsoft Researchers Introduce ARTIST: A Reinforcement Learning Framework That Equips LLMs with Agentic Reasoning and Dynamic Tool Use

    May 10, 2025

    LLMs have made impressive gains in complex reasoning, primarily through innovations in architecture, scale, and training approaches like RL. RL enhances LLMs by using reward signals to guide the model towards more effective reasoning strategies, resulting in longer and more coherent thought processes that adapt dynamically to a task’s complexity. Despite this, most RL-enhanced LLMs rely heavily on static internal knowledge and text-only reasoning, making them ill-suited for tasks requiring real-time information, domain-specific expertise, or precise computations. This limitation is especially evident in knowledge-intensive or open-ended problems where the inability to access and interact with external tools leads to inaccuracies or hallucinations.

    To overcome these constraints, recent work has explored agentic reasoning, where LLMs dynamically engage with external tools and environments during the reasoning process. These tools include web search, APIs, and code execution platforms, while environments range from simulated browsers to operating systems. Agentic reasoning enables models to plan, adapt, and solve tasks interactively, beyond static inference. However, current methods for tool integration often depend on manually designed prompts or supervised fine-tuning, which hinder scalability and generalization. Emerging reinforcement learning techniques like Group Relative Policy Optimization (GRPO) provide more efficient and adaptive training for tool use without step-level supervision. Yet, the intersection of RL, tool use, and agentic decision-making remains underexplored, particularly in real-world tasks that demand multi-turn reasoning, dynamic planning, and robust external interaction. 

    Microsoft Research introduces ARTIST (Agentic Reasoning and Tool Integration in Self-improving Transformers), a framework that combines agentic reasoning, reinforcement learning, and dynamic tool use to enhance LLMs. ARTIST enables models to autonomously decide when, how, and which tools to use during multi-step reasoning, learning robust strategies without step-level supervision. The model improves reasoning and interaction with external environments through integrated tool queries and outputs. Evaluated on challenging math and function-calling benchmarks, ARTIST outperforms top models like GPT-4o, achieving up to 22% gains. It demonstrates emergent agentic behaviors, setting a new standard in generalizable and interpretable problem-solving. 

    ARTIST is a flexible framework that enables LLMs to interact with external tools and environments using reinforcement learning. It alternates between reasoning and tool use, allowing the model to choose when and how to invoke tools like code interpreters or APIs. Training uses GRPO, which avoids value functions and uses outcome-based group rewards. ARTIST structures rollouts into reasoning, tool queries, tool outputs, and final answers, with a composite reward system encouraging correctness, proper format, and successful tool use, enabling adaptive, multi-step problem-solving. 

    ARTIST outperforms various baselines, including GPT-4o and tool-augmented LLMs, on complex mathematical benchmarks like AMC, AIME, and Olympiad. It achieves higher Pass@1 accuracy, with notable gains of up to 22% over base models and over 35% compared to other tool-integrated methods. ARTIST’s advantage comes from its agentic reinforcement learning, enabling it to use external tools and refine multi-step solutions strategically. Compared to prompt-based tool usage, it shows superior tool invocation, response quality, and reasoning depth. While its benefits are most evident in complex tasks, ARTIST significantly improves simpler datasets like MATH-500 through selective tool use. 

    In conclusion, ARTIST is a framework that combines agentic reasoning, reinforcement learning, and dynamic tool use to enhance the capabilities of LLMs. Unlike traditional prompt-based approaches, ARTIST enables models to autonomously plan, adapt, and solve complex tasks by interacting with external tools and environments. It learns effective tool-use strategies without step-by-step supervision, improving accuracy and deeper reasoning. Evaluations on mathematical and function-calling benchmarks show significant performance gains. ARTIST also produces more interpretable reasoning paths and robust behaviors. This work highlights the potential of agentic RL as a promising direction for creating more adaptive and capable AI systems. 


    Check out the Paper. Also, don’t forget to follow us on Twitter.

    Here’s a brief overview of what we’re building at Marktechpost:

    • ML News Community – r/machinelearningnews (92k+ members)
    • Newsletter– airesearchinsights.com/(30k+ subscribers)
    • miniCON AI Events – minicon.marktechpost.com
    • AI Reports & Magazines – magazine.marktechpost.com
    • AI Dev & Research News – marktechpost.com (1M+ monthly readers)
    • Partner with us

    The post Microsoft Researchers Introduce ARTIST: A Reinforcement Learning Framework That Equips LLMs with Agentic Reasoning and Dynamic Tool Use appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleZeroSearch from Alibaba Uses Reinforcement Learning and Simulated Documents to Teach LLMs Retrieval Without Real-Time Search
    Next Article ByteDance Open-Sources DeerFlow: A Modular Multi-Agent Framework for Deep Research Automation

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    August 10, 2025
    Machine Learning

    AI Agent Trends of 2025: A Transformative Landscape

    August 10, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2024-40445 – Forkosh Mime Tex Directory Traversal Arbitrary Code Execution

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-48934 – Deno Deny Env Variable Information Disclosure

    Common Vulnerabilities and Exposures (CVEs)

    ChatGPT is your personal shopper now

    News & Updates

    CVE-2025-50258 – Tenda AC6 Buffer Overflow Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    CVE-2025-31191: Microsoft Exposes macOS Vulnerability Allowing App Sandbox Escape

    May 4, 2025

    CVE-2025-31191: Microsoft Exposes macOS Vulnerability Allowing App Sandbox Escape

    Microsoft Threat Intelligence has disclosed a significant vulnerability in macOS that could allow attackers to bypass the App Sandbox and execute unauthorized code on affected systems. The vulnerabili …
    Read more

    Published Date:
    May 05, 2025 (1 hour, 42 minutes ago)

    Vulnerabilities has been mentioned in this article.

    CVE-2025-31191

    CVE-2024-54527

    Qualcomm Adreno GPU 0-Day Vulnerabilities Exploited to Attack Android Users

    June 2, 2025

    You can run Arch Linux in Windows now – here’s how

    April 29, 2025

    An Animated Introduction to Programming with Python

    June 17, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.