Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 15, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 15, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 15, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 15, 2025

      Intel’s latest Arc graphics driver is ready for DOOM: The Dark Ages, launching for Premium Edition owners on PC today

      May 15, 2025

      NVIDIA’s drivers are causing big problems for DOOM: The Dark Ages, but some fixes are available

      May 15, 2025

      Capcom breaks all-time profit records with 10% income growth after Monster Hunter Wilds sold over 10 million copies in a month

      May 15, 2025

      Microsoft plans to lay off 3% of its workforce, reportedly targeting management cuts as it changes to fit a “dynamic marketplace”

      May 15, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      A cross-platform Markdown note-taking application

      May 15, 2025
      Recent

      A cross-platform Markdown note-taking application

      May 15, 2025

      AI Assistant Demo & Tips for Enterprise Projects

      May 15, 2025

      Celebrating Global Accessibility Awareness Day (GAAD)

      May 15, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Intel’s latest Arc graphics driver is ready for DOOM: The Dark Ages, launching for Premium Edition owners on PC today

      May 15, 2025
      Recent

      Intel’s latest Arc graphics driver is ready for DOOM: The Dark Ages, launching for Premium Edition owners on PC today

      May 15, 2025

      NVIDIA’s drivers are causing big problems for DOOM: The Dark Ages, but some fixes are available

      May 15, 2025

      Capcom breaks all-time profit records with 10% income growth after Monster Hunter Wilds sold over 10 million copies in a month

      May 15, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»This AI Paper from CMU and Google DeepMind Studies the Role of Synthetic Data for Improving Math Reasoning Capabilities of LLMs

    This AI Paper from CMU and Google DeepMind Studies the Role of Synthetic Data for Improving Math Reasoning Capabilities of LLMs

    June 30, 2024

    Large language models (LLMs) face a critical challenge in their training process: the impending scarcity of high-quality internet data. Predictions suggest that by 2026, the available pool of such data will be exhausted, forcing researchers to turn to model-generated or synthetic data for training. This shift presents both opportunities and risks. While some studies have shown that scaling up synthetic data can improve performance on complex reasoning tasks, others have revealed a concerning trend. Training on synthetic data can potentially lead to a downward spiral in model performance, amplifying biases, propagating misinformation, and reinforcing undesired stylistic properties. The core challenge lies in designing synthetic data that effectively addresses data scarcity without compromising the quality and integrity of the resulting models. This task is particularly daunting due to the current lack of understanding regarding how synthetic data influences LLM behavior.

    Researchers have explored various approaches to tackle LLM training challenges using synthetic data. Standard methods like teacher-forcing on expert data have shown limitations, particularly in math reasoning. Efforts to generate positive synthetic data aim to mimic high-quality training data, using sources like stronger teacher models and self-generated content. While this approach has shown promise, challenges persist in verifying the quality of synthetic math data. Concerns about bias amplification, model collapse, and overfitting on spurious steps remain. To mitigate these issues, researchers are investigating the use of negative model-generated responses to identify and unlearn problematic patterns in training data.

    Researchers from Carnegie Mellon University, Google DeepMind, and MultiOn present the study to investigate the impact of synthetic data on LLM math reasoning capabilities. It examines both positive and negative synthetic data, finding that positive data improves performance but with slower scaling rates than pretraining. Notably, self-generated positive responses often match the effectiveness of twice the amount of data from larger models. They introduce a robust approach using negative synthetic data, contrasting it with positive data at critical steps. This technique, equivalent to per-step advantage-weighted reinforcement learning, demonstrates the potential to scale efficiency up to eight times compared to using only positive data. The study develops scaling laws for both data types on common reasoning benchmarks, offering valuable insights into optimizing synthetic data use for enhancing LLM performance in math reasoning tasks.

    The detailed architecture of the proposed method involves several key components:

     Synthetic Data Pipeline:

    Prompts capable models like GPT-4 and Gemini 1.5 Pro to generate new problems similar to real ones.

    Obtains solution traces with step-by-step reasoning for these problems.

    Implements a binary reward function to verify the correctness of solution traces.

    Dataset Construction:

    Creates positive synthetic dataset from correct problem-solution pairs.

    Generates positive and negative datasets using model-generated solutions.

    Learning Algorithms:

    Supervised Finetuning (SFT):

    Trains on 𝒟syn using next-token prediction.

     Rejection Finetuning (RFT):

    Uses SFT policy to generate positive responses for 𝒟syn problems.

    Applies next-token prediction loss on these self-generated positive responses.

    Preference Optimization:

    Utilizes Direct Preference Optimization (DPO) to learn from both positive and negative data.

    Implements two variants: standard DPO and per-step DPO.

    Per-step DPO identifies the “first pit” in solution traces to focus on critical steps.

    This architecture allows for comprehensive analysis of different synthetic data types and learning approaches, enabling the study of their impact on LLM math reasoning capabilities.

    The study reveals significant insights into synthetic data scaling for LLM math reasoning. Positive data scaling shows improvement but with slower rates than pre-training. Surprisingly, self-generated positive data (RFT) outperforms data from more capable models, doubling efficiency. The most striking result comes from strategically using negative data with per-step Direct Preference Optimization, which increases data efficiency by 8x compared to positive data alone. This approach consistently outperforms other methods, highlighting the critical importance of carefully constructing and utilizing both positive and negative synthetic data in LLM training for mathematical reasoning tasks.

    This study explores the impact of synthetic data on improving LLMs’ math reasoning capabilities. It reveals that traditional methods using positive solutions from advanced models show limited efficiency. Self-generated positive data from fine-tuned 7B models improves efficiency by 2x but can amplify reliance on spurious steps. Surprisingly, incorporating negative (incorrect) traces addresses these limitations. By using negative data to estimate step-wise advantages and applying reinforcement learning techniques, the research demonstrates an 8x improvement in synthetic data efficiency. This approach, utilizing preference optimization objectives, significantly enhances LLMs’ mathematical reasoning abilities by effectively balancing positive and negative synthetic data.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. 

    Join our Telegram Channel and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 45k+ ML SubReddit

    The post This AI Paper from CMU and Google DeepMind Studies the Role of Synthetic Data for Improving Math Reasoning Capabilities of LLMs appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleCAT-BENCH: Evaluating Language Models’ Understanding of Temporal Dependencies in Procedural Texts
    Next Article Why Fixing Websites Is a Growth Opportunity for Freelancers

    Related Posts

    Development

    February 2025 Baseline monthly digest

    May 15, 2025
    Artificial Intelligence

    Markus Buehler receives 2025 Washington Award

    May 15, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Brisk – fast, multithreaded, cross-platform download manager

    Development

    The Impact of Responsive Design Testing on E-Commerce Success

    Development

    Google’s New Restore Credentials Tool Simplifies App Login After Android Migration

    Development

    Introducing guardrails in Knowledge Bases for Amazon Bedrock

    Development

    Highlights

    Development

    Countdown to the Kibo Connect Client Summit 2025

    April 16, 2025

    Our trusted Unified Commerce Platform partner, Kibo, is gearing up to host the Kibo Connect…

    Track Job Progress and Status in Laravel with the Laravel Job Status Package

    April 29, 2025

    The Evolution of Artificial Intelligence (AI) Agents: Workflow, Planning, and Matrix Agents Leading Enterprise Automation

    August 6, 2024

    CVE-2025-46827 – Graylog HTML Form Cookie Disclosure

    May 7, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.