Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      A Week In The Life Of An AI-Augmented Designer

      August 22, 2025

      This week in AI updates: Gemini Code Assist Agent Mode, GitHub’s Agents panel, and more (August 22, 2025)

      August 22, 2025

      Microsoft adds Copilot-powered debugging features for .NET in Visual Studio

      August 21, 2025

      Blackstone portfolio company R Systems Acquires Novigo Solutions, Strengthening its Product Engineering and Full-Stack Agentic-AI Capabilities

      August 21, 2025

      I found the ultimate MacBook Air alternative for Windows users – and it’s priced well

      August 23, 2025

      Outdated IT help desks are holding businesses back – but there is a solution

      August 23, 2025

      Android’s latest update can force apps into dark mode – how to see it now

      August 23, 2025

      I tried the Google Pixel Watch 4 – and these key features made it feel indispensable

      August 23, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Building Cross-Platform Alerts with Laravel’s Notification Framework

      August 23, 2025
      Recent

      Building Cross-Platform Alerts with Laravel’s Notification Framework

      August 23, 2025

      Add Notes Functionality to Eloquent Models With the Notable Package

      August 23, 2025

      How to install OpenPlatform — IoT platform

      August 22, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Basics of Digital Forensics

      August 22, 2025
      Recent

      Basics of Digital Forensics

      August 22, 2025

      Top Linux Server Automation Tools: Simplifying System Administration

      August 22, 2025

      Rising from the Ashes: How AlmaLinux and Rocky Linux Redefined the Post-CentOS Landscape

      August 22, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»This AI Paper Introduces FASTCURL: A Curriculum Reinforcement Learning Framework with Context Extension for Efficient Training of R1-like Reasoning Models

    This AI Paper Introduces FASTCURL: A Curriculum Reinforcement Learning Framework with Context Extension for Efficient Training of R1-like Reasoning Models

    April 3, 2025

    Large language models have transformed how machines comprehend and generate text, especially in complex problem-solving areas like mathematical reasoning. These systems, known as R1-like models, are designed to emulate slow and deliberate thought processes. Their key strength is handling intricate tasks requiring step-by-step reasoning across long sequences. These capabilities make them valuable for applications such as solving Olympiad-level math problems or logical reasoning tasks, where depth and coherence of reasoning are essential.

    A significant challenge in training these models is the extensive computation for reinforcement learning using long context windows. Tasks that require multi-step logic force models to produce long outputs which consumes more resources and slows down learning. Further, not all long responses contribute meaningfully to accuracy; many include redundant reasoning. These inefficiencies in response generation and high GPU usage make it difficult to effectively scale training, particularly when working with models with 1.5 billion parameters.

    Previous attempts to address this issue include models like DeepScaleR, which uses a staged context length extension strategy during training. DeepScaleR starts with an 8K context window and expands gradually to 24K over three training phases. Although this approach helps guide the model to manage longer reasoning chains efficiently, it still demands approximately 70,000 A100 GPU hours. DeepScaleR reduces that to 3,800 hours through a progressive strategy but still requires considerable hardware, including setups with up to 32 GPUs in some stages. This shows that while improvements are possible, the solution remains costly and complex.

    Researchers at Tencent introduced a method called FASTCURL to overcome the inefficiencies of traditional reinforcement learning training. This method presents a curriculum-based strategy aligned with context window expansion. FASTCURL splits the dataset based on input prompt length into short, long, and combined categories. The training progresses in four stages, each using a different dataset and context window setting. This approach ensures the model learns simple reasoning before advancing to longer, more complex reasoning steps. The researchers emphasize that the entire training process runs on a single node with just 8 GPUs, reducing setup complexity.

    The approach involves a deliberate segmentation of data by input length, driven by the hypothesis that longer prompts usually lead to longer and more complex outputs. The model first learns using short prompts under an 8K window. As training proceeds, the model transitions to a mixed dataset with 16K window length, then to the long dataset with the same window size, and finally reviews the combined data again. Each stage is trained for one iteration, and FASTCURL requires about 860 training steps. This is efficient compared to DeepScaleR’s 1,750 steps, representing a 50% reduction in training time and resource usage while maintaining effectiveness.

    In performance evaluations, FASTCURL-1.5B-Preview showed improvements over other models across five benchmarks. It scored 88.0 on MATH 500, 43.1 on AIME 2024, 74.2 on AMC 2023, 31.6 on Minerva Math, and 50.4 on OlympiadBench, with an average PASS@1 score of 57.5. Compared to DeepScaleR-1.5B-Preview, which scored an average of 57.0, FASTCURL performed better in four of five datasets. These results highlight that FASTCURL can outperform existing techniques while consuming significantly fewer resources. The model also showed better generalization, particularly on datasets like AMC 2023 and Minerva Math, indicating robustness.

    The research clearly outlines a computational problem in training R1-like reasoning models and offers an innovative curriculum strategy as a solution. The method provides an efficient and practical training framework by combining input-based data segmentation with context expansion. FASTCURL delivers strong performance using fewer steps and limited hardware, proving that strategic training design can be as powerful as raw computational scale.


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.

    🔥 [Register Now] miniCON Virtual Conference on OPEN SOURCE AI: FREE REGISTRATION + Certificate of Attendance + 3 Hour Short Event (April 12, 9 am- 12 pm PST) + Hands on Workshop [Sponsored]

    The post This AI Paper Introduces FASTCURL: A Curriculum Reinforcement Learning Framework with Context Extension for Efficient Training of R1-like Reasoning Models appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleResearchers from Dataocean AI and Tsinghua University Introduces Dolphin: A Multilingual Automatic Speech Recognition ASR Model Optimized for Eastern Languages and Dialects
    Next Article Introduction to MCP: The Ultimate Guide to Model Context Protocol for AI Assistants

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    August 23, 2025
    Machine Learning

    Checklists Are Better Than Reward Models For Aligning Language Models

    August 23, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CodeRabbit brings AI-powered code review into Visual Studio Code

    Tech & Work

    Redpanda Announces the General Availability of Apache Iceberg Topics for the Enterprise

    Tech & Work

    Control Statements in C: A Comprehensive Guide

    Development

    Samsung Partners With EA and Xbox to Bring FC 25 to Gaming Hub

    Operating Systems

    Highlights

    CVE-2025-31712 – Cisco cplog Out-of-Bounds Write Vulnerability

    June 3, 2025

    CVE ID : CVE-2025-31712

    Published : June 3, 2025, 6:15 a.m. | 1 hour, 12 minutes ago

    Description : In cplog service, there is a possible out of bounds write due to a missing bounds check. This could lead to local denial of service with no additional execution privileges needed.

    Severity: 5.1 | MEDIUM

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    How to Build a Lean Security Model: 5 Lessons from River Island

    June 11, 2025

    Meta Resumes E.U. AI Training Using Public User Data After Regulator Approval

    April 15, 2025

    CVE-2025-23178 – Apache HTTP Server SSL/TLS Channel Hijacking Vulnerability

    April 29, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.