Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Artificial Intelligence»AI models face collapse when trained on AI-generated data, study finds

    AI models face collapse when trained on AI-generated data, study finds

    July 28, 2024

    A new study published in Nature reveals that AI models, including large language models (LLMs), rapidly degrade in quality when trained on data generated by previous AI models. 

    This phenomenon, termed “model collapse,” could erode the quality of future AI models, particularly as more AI-generated content is released onto the internet and, therefore, recycled and reused in model training data. 

    Investigating this phenomenon, researchers from the University of Cambridge, University of Oxford, and other institutions conducted experiments showing that when AI models are repeatedly trained on data produced by earlier versions of themselves, they start generating nonsensical outputs. 

    This was observed across different types of AI models, including language models, variational autoencoders, and Gaussian mixture models.

    In one key experiment with language models, the team fine-tuned the OPT-125m model on the WikiText-2 dataset and then used it to generate new text.

    This AI-generated text was then used to train the next “generation” of the model, and the process was repeated over and over. 

    It wasn’t long before models started producing increasingly improbable and nonsensical text. 

    By the ninth generation, the model was generating complete gibberish, such as listing multiple non-existent types of “jackrabbits” when prompted about English church towers.

    The researchers also observed how models lose information about “rare” or infrequent events before complete collapse. 

    This is alarming, as rare events often relate to marginalized groups or outliers. Without them, models risk concentrating their responses across a narrow spectrum of ideas and beliefs, thus reinforcing biases.

    AI companies are aware of this, hence why they’re striking deals with news companies and publishers to secure a steady stream of high-quality, human-written, topically relevant information. 

    “The message is, we have to be very careful about what ends up in our training data,” study co-author Zakhar Shumaylov from the University of Cambridge told Nature. “Otherwise, things will always, provably, go wrong.”

    Compounding this effect, a recent study by Dr. Richard Fletcher, Director of Research at the Reuters Institute for the Study of Journalism, found that nearly half (48%) of the most popular news sites worldwide are now inaccessible to OpenAI’s crawlers, with Google’s AI crawlers being blocked by 24% of sites.

    As a result, AI models have access to a smaller pool of high-quality, recent data than they once did, increasing the risk of training on sub-standard or outdated data. 

    Solutions to model collapse

    Regarding solutions, the researchers state that maintaining access to original, human-generated data sources is vital for AI’s future. 

    Tracking and managing AI-generated content would also be helpful to prevent it from accidentally contaminating training datasets. That would be very tricky, as AI-generated content is becoming impossible to detect. 

    Researchers posit four main solutions:

    Watermarking AI-generated content to distinguish it from human-created data
    Creating incentives for humans to continue producing high-quality content
    Developing more sophisticated filtering and curation methods for training data
    Exploring ways to preserve and prioritize access to original, non-AI-generated information

    Model collapse is a real problem

    This study is far from the only one exploring model collapse. 

    Not long ago, Stanford researchers compared two scenarios in which model collapse might occur: one where each new model iteration’s training data fully replaced the previous data and another where synthetic data is added to the existing dataset.

    When data was replaced, model performance deteriorated rapidly across all tested architectures. 

    However, when data was allowed to “accumulate,” model collapse was largely avoided. The AI systems maintained their performance and, in some cases, showed improvements.

    So, despite credible concerns, model collapse isn’t a foregone conclusion – it depends on how much AI-generated data is in the set and the ratio of synthetic to authentic data. 

    If and when model collapse starts to become evident in frontier models, you can be certain that AI companies will be scrambling for a long-term solution. 

    We’re not there yet, but it might be a matter of when, not if.

    The post AI models face collapse when trained on AI-generated data, study finds appeared first on DailyAI.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleSAG-AFTRA launches strike against video game companies over AI concerns
    Next Article Excessive Heat? Wearable Air Conditioners: The Future of Personalized Climate Control

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 17, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-40906 – MongoDB BSON Serialization BSON::XS Multiple Vulnerabilities

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Are AI-RAG Solutions Really Hallucination-Free? Researchers at Stanford University Assess the Reliability of AI in Legal Research: Hallucinations and Accuracy Challenges

    Development

    Best Free and Open Source Alternatives to Google Analytics

    Linux

    CVE-2025-46533 – WordPress wpdrift.no Stored Cross-site Scripting (XSS)

    Common Vulnerabilities and Exposures (CVEs)

    OptiImage – GUI image compressor

    Linux
    Hostinger

    Highlights

    Development

    What exactly is Once Human? Whatever it is, it’s really good

    June 30, 2024

    I had a chance to play the Once Human demo last week. While I was…

    Introducing Curated Solutions for Databases on AWS

    July 26, 2024

    How to Navigate the VMware License Cost Increase

    August 13, 2024

    New Guide Explains How to Eliminate the Risk of Shadow SaaS and Protect Corporate Data

    May 3, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.