Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 15, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 15, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 15, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 15, 2025

      Intel’s latest Arc graphics driver is ready for DOOM: The Dark Ages, launching for Premium Edition owners on PC today

      May 15, 2025

      NVIDIA’s drivers are causing big problems for DOOM: The Dark Ages, but some fixes are available

      May 15, 2025

      Capcom breaks all-time profit records with 10% income growth after Monster Hunter Wilds sold over 10 million copies in a month

      May 15, 2025

      Microsoft plans to lay off 3% of its workforce, reportedly targeting management cuts as it changes to fit a “dynamic marketplace”

      May 15, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      A cross-platform Markdown note-taking application

      May 15, 2025
      Recent

      A cross-platform Markdown note-taking application

      May 15, 2025

      AI Assistant Demo & Tips for Enterprise Projects

      May 15, 2025

      Celebrating Global Accessibility Awareness Day (GAAD)

      May 15, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Intel’s latest Arc graphics driver is ready for DOOM: The Dark Ages, launching for Premium Edition owners on PC today

      May 15, 2025
      Recent

      Intel’s latest Arc graphics driver is ready for DOOM: The Dark Ages, launching for Premium Edition owners on PC today

      May 15, 2025

      NVIDIA’s drivers are causing big problems for DOOM: The Dark Ages, but some fixes are available

      May 15, 2025

      Capcom breaks all-time profit records with 10% income growth after Monster Hunter Wilds sold over 10 million copies in a month

      May 15, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Scaling AI Models: Combating Collapse with Reinforced Synthetic Data

    Scaling AI Models: Combating Collapse with Reinforced Synthetic Data

    June 15, 2024

    As AI-generated data increasingly supplements or even replaces human-annotated data, concerns have arisen about the degradation in model performance when models are iteratively trained on synthetic data. Model collapse refers to this phenomenon where a model’s performance deteriorates significantly when trained on synthesized data generated using the model. This problem is significant because it hinders the development of more efficient and effective methods for developing high-quality summaries from large volumes of text data.

    Current methods to counteract model collapse involve several approaches, including using Reinforcement Learning with Human Feedback (RLHF), data curation, and prompt engineering. RLHF leverages human feedback to ensure the data quality used for training, thereby maintaining or enhancing model performance. RLHF has successfully improved model performance by ensuring that the model learns from high-quality, human-approved data. However, this approach is costly and not scalable, as it relies heavily on human annotators.

    Another method involves careful curation and filtering of synthesized data. This can include using heuristics or pre-defined rules to discard low-quality or irrelevant data before it is used for training. While this method can help mitigate the negative impact of low-quality synthesized data, it often requires significant effort to maintain the quality of the training dataset, and it only partially eliminates the risk of model collapse if the filtering criteria are robust enough. Additionally, prompt engineering is a technique that involves crafting specific prompts that guide the model to generate higher-quality outputs. Prompt engineering is not a foolproof method and can be limited by the inherent biases and weaknesses of the model itself. And it often requires expert knowledge and iterative experimentation to achieve optimal results.

    To address these limitations, a team of researchers from Meta AI, NYU, and Peking University propose a method that incorporates feedback on synthesized data, aiming to prevent model collapse through reinforcement techniques. Their approach involves using feedback mechanisms to select or prune synthesized data, ensuring that only high-quality data is used for further training. This method is posited as a more efficient and scalable alternative to RLHF, as it can be partially or fully automated.

    The core of the proposed methodology lies in enhancing synthesized data through feedback mechanisms, which can be from humans or other models. The researchers provide a theoretical framework demonstrating that a Gaussian mixture classification model can achieve optimal performance when trained on feedback-augmented synthesized data.

    Two practical experiments validate the theoretical predictions. The first experiment involves training transformers to compute matrix eigenvalues, a task that experiences model collapse when trained on purely synthesized data. The model’s performance significantly improves by pruning incorrect predictions and selecting the best guesses from synthesized data, demonstrating the effectiveness of reinforcement through data selection. The second experiment focuses on news summarization with large language models (LLMs) such as LLaMA-2. Here, feedback-augmented data prevents performance degradation, even when the volume of synthesized data increases, supporting the hypothesis that reinforcement is crucial for maintaining model integrity.

    The researchers employ a decoding strategy to generate summaries and assess their performance using the Rouge-1 metric. They also use a strong verifier model, Llama-3, to select the best-synthesized data for training. The results show that the proposed method significantly outperforms the original model trained on the full dataset, even when using only 12.5% of the data. It was observed that the model trained with synthesized data selected by the oracle achieves the best performance, indicating that the proposed method effectively mitigates model collapse. This is a significant finding, as it suggests that when properly reinforced, high-quality synthetic data can match and potentially exceed the quality of human-generated data.

    The research offers a promising solution to the problem of model collapse in LLMs trained on synthesized data. By incorporating feedback mechanisms to enhance the quality of synthetic data, the proposed method ensures sustained model performance without the need for extensive human intervention. This approach provides a scalable, cost-effective alternative to current RLHF methods, paving the way for more robust and reliable AI systems in the future.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. 

    Join our Telegram Channel and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 44k+ ML SubReddit

    The post Scaling AI Models: Combating Collapse with Reinforced Synthetic Data appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleNVIDIA AI Introduces Nemotron-4 340B: A Family of Open Models that Developers can Use to Generate Synthetic Data for Training Large Language Models (LLMs)
    Next Article Beyond ‘Password123’: 6 Steps to Create Unbreakable Passwords

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 16, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-4732 – TOTOLINK A3002R/A3002RU HTTP POST Request Handler Buffer Overflow

    May 16, 2025
    Leave A Reply Cancel Reply

    Hostinger

    Continue Reading

    Salesforce AI Research Introduces LaTRO: A Self-Rewarding Framework for Enhancing Reasoning Capabilities in Large Language Models

    Development

    How to think about Baseline and polyfills

    Development

    Cellebrite Android Zero-Day Exploit PoC Released: CVE-2024-53104

    Security

    AI Module Security Flaws in Drupal: MyCERT Urges Immediate Patching

    Development

    Highlights

    Development

    Hardcoded Credential Vulnerability Found in SolarWinds Web Help Desk

    August 23, 2024

    SolarWinds has issued patches to address a new security flaw in its Web Help Desk…

    Daily Scrum : An Agile Essential

    March 28, 2025

    QR Code Generator Component – React Native QRCode Skia

    April 10, 2024

    Hogwarts Legacy DLC reportedly canceled by WB Games

    March 27, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.