Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      How To Prevent WordPress SQL Injection Attacks

      June 14, 2025

      This week in AI dev tools: Apple’s Foundations Model framework, Mistral’s first reasoning model, and more (June 13, 2025)

      June 13, 2025

      Open Talent platforms emerging to match skilled workers to needs, study finds

      June 13, 2025

      Java never goes out of style: Celebrating 30 years of the language

      June 12, 2025

      6 registry tweaks every tech-savvy user must apply on Windows 11

      June 14, 2025

      Here’s why network infrastructure is vital to maximizing your company’s AI adoption

      June 14, 2025

      The AI video tool behind the most viral social trends right now

      June 14, 2025

      Got a new password manager? How to clean up the password mess you left in the cloud

      June 14, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Right Invoicing App for iPhone: InvoiceTemple

      June 14, 2025
      Recent

      Right Invoicing App for iPhone: InvoiceTemple

      June 14, 2025

      Tunnel Run game in 170 lines of pure JS

      June 14, 2025

      Integrating Drupal with Salesforce SSO via SAML and Dynamic User Sync

      June 14, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Windows 11 24H2 tests toggle to turn off Recommended feed in the Start menu

      June 14, 2025
      Recent

      Windows 11 24H2 tests toggle to turn off Recommended feed in the Start menu

      June 14, 2025

      User calls Windows 11 “pure horror,” Microsoft says it’s listening to feedback

      June 14, 2025

      John the Ripper is an advanced offline password cracker

      June 14, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»OpenThoughts: A Scalable Supervised Fine-Tuning SFT Data Curation Pipeline for Reasoning Models

    OpenThoughts: A Scalable Supervised Fine-Tuning SFT Data Curation Pipeline for Reasoning Models

    June 14, 2025

    The Growing Complexity of Reasoning Data Curation

    Recent reasoning models, such as DeepSeek-R1 and o3, have shown outstanding performance in mathematical, coding, and scientific areas, utilizing post-training techniques like supervised fine-tuning (SFT) and reinforcement learning (RL). However, the complete methodologies behind these frontier reasoning models are not public, which makes research for building reasoning models difficult. While SFT data curation has become a powerful approach for developing strong reasoning capabilities, most existing efforts explore only limited design choices, such as relying solely on human-written questions or single teacher models. Moreover, exploring the extensive design space of various techniques for generating question-answer pairs requires high costs for teacher inference and model training.

    Reasoning traces provided by models such as Gemini, QwQ, and DeepSeek-R1 have enabled knowledge distillation techniques to train smaller reasoning models. Projects like OpenR1, OpenMathReasoning, and OpenCodeReasoning collect questions from public forums and competition sites, while Natural Reasoning utilizes pre-training corpora as seed data. Some efforts, such as S1 and LIMO, focus on manually curating small, high-quality datasets of challenging prompts. Other methods, such as DeepMath-103K and Nvidia Nemotron, introduce innovations across data sourcing, filtering, and scaling stages. RL methods, including AceReason and Skywork-OR1, have enhanced reasoning capabilities beyond traditional SFT methods.

    OpenThoughts: A Scalable Framework for SFT Dataset Development

    Researchers from Stanford University, the University of Washington, BespokeLabs.ai, Toyota Research Institute, UC Berkeley, and 12 additional organizations have proposed OpenThoughts, a new SOTA open reasoning data recipe. OpenThoughts uses a progressive approach across three iterations: OpenThoughts-114K scales the Sky-T1 pipeline with automated verification, OpenThoughts2-1M enhances data scale through augmented question diversity and synthetic generation strategies, and OpenThoughts3-1.2M incorporates findings from over 1,000 ablation experiments to develop a simple, scalable, and high-performing data curation pipeline. Moreover, the model OpenThinker3-7B achieves state-of-the-art performance among open-data models at the 7B scale.

    The OpenThoughts3-1.2M is built by ablating each pipeline component independently while maintaining constant conditions across other stages, generating 31,600 data points per strategy and fine-tuning Qwen2.5-7B-Instruct on each resulting dataset. The goal during training is to create the best dataset of question-response pairs for SFT reasoning. Evaluation occurs across eight reasoning benchmarks across mathematics (AIME24, AMC23, MATH500), coding (CodeElo, CodeForces, LiveCodeBench), and science (GPQA Diamond, JEEBench). The experimental design includes a rigorous decontamination process to remove high-similarity samples and maintains a held-out benchmark set for generalization testing. Evalchemy serves as the primary evaluation tool, ensuring consistent evaluation protocols.

    Evaluation Insights and Benchmark Performance

    The OpenThoughts pipeline evaluation reveals key insights across question sourcing, mixing, filtering, answer filtering, and the teacher model. Question sourcing experiments show that CodeGolf and competitive coding questions achieve the highest performance for code tasks (25.3-27.5 average scores), while LLM-generated and human-written questions excel in mathematics (58.8-58.5 scores), and physics StackExchange questions with chemistry textbook extractions perform best in science (43.2-45.3 scores). Mixing question shows that combining multiple question sources degrades performance, with optimal results of 5% accuracy improvements over diverse mixing strategies. In the teacher model, QwQ-32B outperforms DeepSeek-R1 in knowledge distillation, achieving an accuracy improvement of 1.9-2.6%.

    In conclusion, researchers present the OpenThoughts project, showing that systematic experimentation can significantly advance SFT data curation for reasoning models. Researchers developed OpenThoughts3-1.2M, a state-of-the-art open-data reasoning dataset across science, mathematics, and coding domains. The resulting OpenThinker3-7B model achieves superior performance among open-data reasoning models at its scale. However, several limitations remain unexplored, including RL approaches, staged fine-tuning, and curriculum learning strategies. Future research directions include investigating cross-domain transfer effects when optimizing individual domains versus overall performance, and understanding the scaling dynamics as student models approach teacher capabilities.


    Check out the Paper, Project Page and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 99k+ ML SubReddit and Subscribe to our Newsletter.

    The post OpenThoughts: A Scalable Supervised Fine-Tuning SFT Data Curation Pipeline for Reasoning Models appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleHighlighted at CVPR 2025: Google DeepMind’s ‘Motion Prompting’ Paper Unlocks Granular Video Control
    Next Article Can You Build Your Dream Website Using AI? These Tools Say You Can

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 14, 2025
    Machine Learning

    MemOS: A Memory-Centric Operating System for Evolving and Adaptive Large Language Models

    June 14, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2025-4311 – iSourcecode Content Management System SQL Injection Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-5850 – Tenda AC15 HTTP POST Request Handler Buffer Overflow Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-46779 – Apache HTTP Server HTTP Header Injection

    Common Vulnerabilities and Exposures (CVEs)

    Windows 11’s Android (WSA) finally loses support, but can you still install it?

    Operating Systems

    Highlights

    Securing the Future: Best Practices for Data Privacy in AI Projects🔐

    May 15, 2025

    Post Content Source: Read More 

    Adobe adviseert webshops kritiek beveiligingslek binnen 72 uur te patchen

    June 11, 2025

    CVE-2025-48999 – DataEase SQL Injection Vulnerability

    June 3, 2025

    Grand Theft Auto 6 has been DELAYED — will now miss 2025 launch window entirely

    May 2, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.