Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      How To Prevent WordPress SQL Injection Attacks

      June 13, 2025

      Java never goes out of style: Celebrating 30 years of the language

      June 12, 2025

      OpenAI o3-pro available in the API, BrowserStack adds Playwright support for real iOS devices, and more – Daily News Digest

      June 12, 2025

      Creating The “Moving Highlight” Navigation Bar With JavaScript And CSS

      June 11, 2025

      Microsoft Copilot’s own default configuration exposed users to the first-ever “zero-click” AI attack, but there was no data breach

      June 13, 2025

      Sam Altman says “OpenAI was forced to do a lot of unnatural things” to meet the Ghibli memes demand surge

      June 13, 2025

      5 things we didn’t get from the Xbox Games Showcase, because Xbox obviously hates me personally

      June 13, 2025

      Minecraft Vibrant Visuals finally has a release date and it’s dropping with the Happy Ghasts

      June 13, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      QAQ-QQ-AI-QUEST

      June 13, 2025
      Recent

      QAQ-QQ-AI-QUEST

      June 13, 2025

      JS Dark Arts: Abusing prototypes and the Result type

      June 13, 2025

      Helpful Git Aliases To Maximize Developer Productivity

      June 13, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft Copilot’s own default configuration exposed users to the first-ever “zero-click” AI attack, but there was no data breach

      June 13, 2025
      Recent

      Microsoft Copilot’s own default configuration exposed users to the first-ever “zero-click” AI attack, but there was no data breach

      June 13, 2025

      Sam Altman says “OpenAI was forced to do a lot of unnatural things” to meet the Ghibli memes demand surge

      June 13, 2025

      5 things we didn’t get from the Xbox Games Showcase, because Xbox obviously hates me personally

      June 13, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»CURE: A Reinforcement Learning Framework for Co-Evolving Code and Unit Test Generation in LLMs

    CURE: A Reinforcement Learning Framework for Co-Evolving Code and Unit Test Generation in LLMs

    June 12, 2025

    Introduction

    Large Language Models (LLMs) have shown substantial improvements in reasoning and precision through reinforcement learning (RL) and test-time scaling techniques. Despite outperforming traditional unit test generation methods, most existing approaches such as O1-Coder and UTGEN require supervision from ground-truth code. This supervision increases data collection costs and limits the scale of usable training data.

    Limitations of Existing Approaches

    Conventional unit test generation relies on:

    • Software analysis methods, which are rule-based and rigid.
    • Neural machine translation techniques, which often lack semantic alignment.

    While recent prompt-based and agentic methods improve performance, they still depend heavily on labeled code for fine-tuning. This reliance restricts adaptability and scalability, particularly in real-world, large-scale deployment scenarios.

    CURE: A Self-Supervised Co-Evolutionary Approach

    Researchers from the University of Chicago, Princeton University, Peking University, and ByteDance Seed introduce CURE, a self-supervised reinforcement learning framework that jointly trains a code generator and a unit test generator without any ground-truth code.

    CURE operates using a self-play mechanism in which:

    • The LLM generates both correct and incorrect code.
    • The unit test generator learns to distinguish failure modes and refines itself accordingly.

    This bidirectional co-evolution enhances both code generation and verification without external supervision.

    Architecture and Methodology

    Base Models and Sampling Strategy

    CURE is built on Qwen2.5-7B and 14B Instruct models, with Qwen3-4B used for long-chain-of-thought (CoT) variants. Each training step samples:

    • 16 candidate code completions.
    • 16 task-derived unit tests.

    Sampling is performed using vLLM with temperature 1.0 and top-p 1.0. For long-CoT models, a response-length-aware transformation penalizes lengthy outputs, improving inference-time efficiency.

    Reward Function and Optimization

    CURE introduces a mathematically grounded reward formulation to:

    • Maximize reward precision, defined as the likelihood that correct code scores higher than incorrect code across generated unit tests.
    • Apply response-based reward adjustments for long responses to reduce latency.

    Optimization proceeds via policy gradient methods, jointly updating the coder and unit tester to improve their mutual performance.

    Benchmark Datasets and Evaluation Metrics

    CURE is evaluated on five standard coding datasets:

    • LiveBench
    • MBPP
    • LiveCodeBench
    • CodeContests
    • CodeForces

    Performance is measured across:

    • Unit test accuracy
    • One-shot code generation accuracy
    • Best-of-N (BoN) accuracy using 16 code and test samples.

    Performance and Efficiency Gains

    The ReasonFlux-Coder models derived via CURE achieve:

    • +37.8% in unit test accuracy.
    • +5.3% in one-shot code generation accuracy.
    • +9.0% in BoN accuracy.

    Notably, ReasonFlux-Coder-4B achieves 64.8% reduction in average unit test response length—substantially improving inference speed. Across all benchmarks, these models outperform traditional coding-supervised fine-tuned models (e.g., Qwen2.5-Coder-Instruct).

    Application to Commercial LLMs

    When ReasonFlux-Coder-4B is paired with GPT-series models:

    • GPT-4o-mini gains +5.5% BoN accuracy.
    • GPT-4.1-mini improves by +1.8%.
    • API costs are reduced while performance is enhanced, indicating a cost-effective solution for production-level inference pipelines.

    Use as Reward Model for Label-Free Fine-Tuning

    CURE-trained unit test generators can be repurposed as reward models in RL training. Using ReasonFlux-Coder-4B’s generated unit tests yields comparable improvements to human-labeled test supervision—enabling fully label-free reinforcement learning pipelines.

    Broader Applicability and Future Directions

    Beyond BoN, ReasonFlux-Coder models integrate seamlessly with agentic coding frameworks like:

    • MPSC (Multi-Perspective Self-Consistency)
    • AlphaCodium
    • S*

    These systems benefit from CURE’s ability to refine both code and tests iteratively. CURE also boosts agentic unit test generation accuracy by over 25.1%, reinforcing its versatility.

    Conclusion

    CURE represents a significant advancement in self-supervised learning for code generation and validation, enabling large language models to jointly evolve their coding and unit test generation capabilities without reliance on ground-truth code. By leveraging a co-evolutionary reinforcement learning framework, CURE not only enhances core performance metrics such as one-shot accuracy and Best-of-N selection but also improves inference efficiency through response-length-aware optimization. Its compatibility with existing agentic coding pipelines and ability to function as a label-free reward model make it a scalable and cost-effective solution for both training and deployment scenarios.


    Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 99k+ ML SubReddit and Subscribe to our Newsletter.

    The post CURE: A Reinforcement Learning Framework for Co-Evolving Code and Unit Test Generation in LLMs appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleRun Multiple AI Coding Agents in Parallel with Container-Use from Dagger
    Next Article Top AI Trends & How to Choose The Right AI Tools

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 13, 2025
    Machine Learning

    Google AI Unveils a Hybrid AI-Physics Model for Accurate Regional Climate Risk Forecasts with Better Uncertainty Assessment

    June 13, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Together AI Released DeepCoder-14B-Preview: A Fully Open-Source Code Reasoning Model That Rivals o3-Mini With Just 14B Parameters

    Together AI Released DeepCoder-14B-Preview: A Fully Open-Source Code Reasoning Model That Rivals o3-Mini With Just 14B Parameters

    Machine Learning

    CVE-2025-4032 – InclusionAI AWorld Os Command Injection Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    ASCII Draw lets you sketch anything using characters

    Linux

    CVE-2025-4919 – Mozilla Firefox Out-of-Bounds JavaScript Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    Le notizie minori del mondo GNU/Linux e dintorni della settimana nr 16/2025 Linux

    Le notizie minori del mondo GNU/Linux e dintorni della settimana nr 16/2025

    April 20, 2025

    Ogni settimana, il mondo del software libero e open source ci offre una moltitudine di…

    CVE-2025-3801 – Songquanpeng One-Api Cross Site Scripting Vulnerability

    April 20, 2025

    Rilasciata Debian 12.11: Aggiornamento di Sicurezza e Stabilità

    May 18, 2025

    Can’t Find a Tax Accountant? Here’s Why and What To Do Next

    April 4, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.