Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      10 Benefits of Hiring a React.js Development Company (2025–2026 Edition)

      August 13, 2025

      From Line To Layout: How Past Experiences Shape Your Design Career

      August 13, 2025

      Hire React.js Developers in the US: How to Choose the Right Team for Your Needs

      August 13, 2025

      Google’s coding agent Jules gets critique functionality

      August 13, 2025

      The best smartphones without AI features in 2025: Expert tested and recommended

      August 13, 2025

      GPT-5 was supposed to simplify ChatGPT but now it has 4 new modes – here’s why

      August 13, 2025

      Gemini just got two of ChatGPT’s best features – and they’re free

      August 13, 2025

      The HP OmniBook 5 laptop offers 34 hours of battery life – and it’s 60% off today only

      August 13, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Laravel Boost is released

      August 13, 2025
      Recent

      Laravel Boost is released

      August 13, 2025

      Frontend Standards for Optimizely Configured Commerce: Clean & Scalable Web Best Practices

      August 13, 2025

      Live Agent Escalation in Copilot Studio Using D365 Omnichannel – Architecture and Use Case

      August 13, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      OpenAI’s Sam Altman: GPT-5 fails to meet AGI standards amid Microsoft’s fading partnership — “it’s still missing something”

      August 13, 2025
      Recent

      OpenAI’s Sam Altman: GPT-5 fails to meet AGI standards amid Microsoft’s fading partnership — “it’s still missing something”

      August 13, 2025

      You Think You Need a Monster PC to Run Local AI, Don’t You? — My Seven-Year-Old Mid-range Laptop Says Otherwise

      August 13, 2025

      8 Registry Tweaks that will Make File Explorer Faster and Easier to Use on Windows 11

      August 13, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»NVIDIA AI Releases ProRLv2: Advancing Reasoning in Language Models with Extended Reinforcement Learning RL

    NVIDIA AI Releases ProRLv2: Advancing Reasoning in Language Models with Extended Reinforcement Learning RL

    August 12, 2025

    Table of contents

    • What Is ProRLv2?
    • Key Innovations in ProRLv2
    • How ProRLv2 Expands LLM Reasoning
    • Why It Matters
    • Using Nemotron-Research-Reasoning-Qwen-1.5B-v2
    • Conclusion

    What Is ProRLv2?

    ProRLv2 is the latest version of NVIDIA’s Prolonged Reinforcement Learning (ProRL), designed specifically to push the boundaries of reasoning in large language models (LLMs). By scaling reinforcement learning (RL) steps from 2,000 up to 3,000, ProRLv2 systematically tests how extended RL can unlock new solution spaces, creativity, and high-level reasoning that were previously inaccessible—even with smaller models like the 1.5B-parameter Nemotron-Research-Reasoning-Qwen-1.5B-v2.

    Key Innovations in ProRLv2

    ProRLv2 incorporates several innovations to overcome common RL limitations in LLM training:

    • REINFORCE++- Baseline: A robust RL algorithm that enables long-horizon optimization over thousands of steps, handling the instability typical in RL for LLMs.
    • KL Divergence Regularization & Reference Policy Reset: Periodically refreshes the reference model with the current best checkpoint, allowing stable progress and continued exploration by preventing the RL objective from dominating too early.
    • Decoupled Clipping & Dynamic Sampling (DAPO): Encourages diverse solution discovery by boosting unlikely tokens and focusing learning signals on prompts of intermediate difficulty.
    • Scheduled Length Penalty: Cyclically applied, helping maintain diversity and prevent entropy collapse as training lengthens.
    • Scaling Training Steps: ProRLv2 moves the RL training horizon from 2,000 to 3,000 steps, directly testing how much longer RL can expand reasoning abilities.
    Recommended Article: NVIDIA AI Released DiffusionRenderer: An AI Model for Editable, Photorealistic 3D Scenes from a Single Video

    How ProRLv2 Expands LLM Reasoning

    Nemotron-Research-Reasoning-Qwen-1.5B-v2, trained with ProRLv2 for 3,000 RL steps, sets a new standard for open-weight 1.5B models on reasoning tasks, including math, code, science, and logic puzzles:

    • Performance surpasses previous versions and competitors like DeepSeek-R1-1.5B.
    • Sustained gains with more RL steps: Longer training leads to continual improvements, especially on tasks where base models perform poorly, demonstrating genuine expansion in reasoning boundaries.
    • Generalization: Not only does ProRLv2 boost pass@1 accuracy, but it also enables novel reasoning and solution strategies on tasks not seen during training.
    • Benchmarks: Gains include average pass@1 improvements of 14.7% in math, 13.9% in coding, 54.8% in logic puzzles, 25.1% in STEM reasoning, and 18.1% in instruction-following tasks, with further improvements in v2 on unseen and harder benchmarks.

    Why It Matters

    The major finding of ProRLv2 is that continued RL training, with careful exploration and regularization, reliably expands what LLMs can learn and generalize. Rather than hitting an early plateau or overfitting, prolonged RL allows smaller models to rival much larger ones in reasoning—demonstrating that scaling RL itself is as important as model or dataset size.

    Using Nemotron-Research-Reasoning-Qwen-1.5B-v2

    The latest checkpoint is available for testing on Hugging Face. Loading the model:

    Copy CodeCopiedUse a different Browser
    from transformers import AutoTokenizer, AutoModelForCausalLM
    
    tokenizer = AutoTokenizer.from_pretrained("nvidia/Nemotron-Research-Reasoning-Qwen-1.5B")
    model = AutoModelForCausalLM.from_pretrained("nvidia/Nemotron-Research-Reasoning-Qwen-1.5B")
    

    Conclusion

    ProRLv2 redefines the limits of reasoning in language models by showing that RL scaling laws matter as much as size or data. Through advanced regularization and smart training schedules, it enables deep, creative, and generalizable reasoning even in compact architectures. The future lies in how far RL can push—not just how big models can get.


    Check out the Unofficial Blog and Model on Hugging Face here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

    🇬 Star us on GitHub
    🇸 Sponsor us

    The post NVIDIA AI Releases ProRLv2: Advancing Reasoning in Language Models with Extended Reinforcement Learning RL appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleLearn DevSecOps and API Security
    Next Article Train and deploy AI models at trillion-parameter scale with Amazon SageMaker HyperPod support for P6e-GB200 UltraServers

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    August 13, 2025
    Machine Learning

    Nebius AI Advances Open-Weight LLMs Through Reinforcement Learning for Capable SWE Agents

    August 13, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Convert Any Value to Collections with Laravel’s Collection::wrap Method

    Development

    CVE-2024-51984 – Apache Device Passcode Authentication Service Password Disclosure Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    How to replace your Windows 11 Start menu with a better alternative – including my favorite

    News & Updates

    CVE-2025-51055 – Vedo Suite Insecure Data Storage Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    CVE-2025-4005 – PHPGurukul COVID19 Testing Management System SQL Injection Vulnerability

    April 28, 2025

    CVE ID : CVE-2025-4005

    Published : April 28, 2025, 7:15 a.m. | 1 hour, 13 minutes ago

    Description : A vulnerability was found in PHPGurukul COVID19 Testing Management System 1.0. It has been rated as critical. This issue affects some unknown processing of the file /patient-report.php. The manipulation of the argument searchdata leads to sql injection. The attack may be initiated remotely. The exploit has been disclosed to the public and may be used.

    Severity: 7.3 | HIGH

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    CVE-2023-28902 – Skoda MIB3 Infotainment Unit Integer Underflow Denial-of-Service Vulnerability

    June 28, 2025
    Minecraft is enjoying a significant player boost and an increase in sales thanks to the success of its big screen counterpart

    Minecraft is enjoying a significant player boost and an increase in sales thanks to the success of its big screen counterpart

    April 11, 2025

    4 Fun Free and Open Source Meme Generation Tools

    April 29, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.