Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 15, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 15, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 15, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 15, 2025

      Intel’s latest Arc graphics driver is ready for DOOM: The Dark Ages, launching for Premium Edition owners on PC today

      May 15, 2025

      NVIDIA’s drivers are causing big problems for DOOM: The Dark Ages, but some fixes are available

      May 15, 2025

      Capcom breaks all-time profit records with 10% income growth after Monster Hunter Wilds sold over 10 million copies in a month

      May 15, 2025

      Microsoft plans to lay off 3% of its workforce, reportedly targeting management cuts as it changes to fit a “dynamic marketplace”

      May 15, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      A cross-platform Markdown note-taking application

      May 15, 2025
      Recent

      A cross-platform Markdown note-taking application

      May 15, 2025

      AI Assistant Demo & Tips for Enterprise Projects

      May 15, 2025

      Celebrating Global Accessibility Awareness Day (GAAD)

      May 15, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Intel’s latest Arc graphics driver is ready for DOOM: The Dark Ages, launching for Premium Edition owners on PC today

      May 15, 2025
      Recent

      Intel’s latest Arc graphics driver is ready for DOOM: The Dark Ages, launching for Premium Edition owners on PC today

      May 15, 2025

      NVIDIA’s drivers are causing big problems for DOOM: The Dark Ages, but some fixes are available

      May 15, 2025

      Capcom breaks all-time profit records with 10% income growth after Monster Hunter Wilds sold over 10 million copies in a month

      May 15, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»How AI Scales with Data Size? This Paper from Stanford Introduces a New Class of Individualized Data Scaling Laws for Machine Learning

    How AI Scales with Data Size? This Paper from Stanford Introduces a New Class of Individualized Data Scaling Laws for Machine Learning

    July 5, 2024

    Machine learning models for vision and language, have shown significant improvements recently, thanks to bigger model sizes and a huge amount of high-quality training data. Research shows that more training data improves models predictably, leading to scaling laws that explain the link between error rates and dataset size. These scaling laws help decide the balance between model size and data size, but they look at the dataset as a whole without considering individual training examples. This is a limitation because some data points are more valuable than others, especially in noisy datasets collected from the web. So, it is crucial to understand how each data point or source affects model training. 

    The related works in this paper discuss a method called Scaling Laws for deep learning, which have become popular in recent years. These laws help in several ways, including understanding the trade-offs between increasing training data and model size, predicting the performance of large models, and comparing how well different learning algorithms perform at smaller scales. The second approach focuses on how individual data points can improve the model’s performance. These methods usually score training examples based on their marginal contribution. They can identify mislabeled data, filter out high-quality data, upweight helpful examples, and select promising new data points for active learning.

    Researchers from Stanford University have introduced a new approach by investigating scaling behavior for the value of individual data points. They found that the contribution of a data point to a model’s performance decreases predictably as the dataset grows larger, following a log-linear pattern. However, this decrease varies among data points, meaning that some points are more useful in smaller datasets, while others become more valuable in larger datasets. Moreover, a maximum likelihood estimator and an amortized estimator were introduced to efficiently learn these individual patterns from a small number of noisy observations for each data point.

    Experiments are carried out to provide evidence for the parametric scaling law, focusing on three types of models: logistic regression, SVMs, and MLPs (specifically, two-layer ReLU networks). These models are tested on three datasets: MiniBooNE, CIFAR-10, and IMDB movie reviews. Pre-trained embeddings like frozen ResNet-50 and BERT, are used to speed up training and prevent underfitting for CIFAR-10 and IMDB, respectively. The performance of each model is measured using cross-entropy loss on a test dataset of 1000 samples. For logistic regression, 1000 data points and 1000 samples per k value are used. For SVMs and MLPs, due to the higher variance in marginal contributions, 200 data points and 5000 samples per dataset size k are used.

    The proposed methods are tested by predicting how accurate the marginal contributions are at each dataset size. For instance, with the IMDB dataset and logistic regression, expectations can accurately be predicted for dataset sizes ranging from k = 100 to k = 1000. To systematically evaluate this, the accuracy of the scaling law predictions is shown across different dataset sizes for both versions of a likelihood-based estimator using different samples. A more detailed version of these results shows the reduction of the R2 score when predictions are extended beyond k = 2500, while the correlation and rank correlation with the true expectations stays high.

    In conclusion, researchers from Stanford University have developed a new method by examining how the value of individual data points changes with scale. They found evidence for a simple pattern that works across different datasets and model types. Experiments confirmed this scaling law by showing a clear log-linear trend and testing how well it predicts contributions at different dataset sizes. The scaling law can be used to predict behavior for larger datasets than those initially tested. However, measuring this behavior for an entire training dataset is expensive, so researchers developed ways to measure the scaling parameters using a small number of noisy observations per data point.

    high-quality data in AI research.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. 

    Join our Telegram Channel and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 46k+ ML SubReddit

    The post How AI Scales with Data Size? This Paper from Stanford Introduces a New Class of Individualized Data Scaling Laws for Machine Learning appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleClaude AI: A Comprehensive Overview Exploring the Advanced Capabilities and Ethical Design of Anthropic’s Leading Language Model
    Next Article Qdrant Unveils BM42: A Cutting-Edge Pure Vector-Based Hybrid Search Algorithm Optimizing RAG and AI Applications

    Related Posts

    Development

    February 2025 Baseline monthly digest

    May 15, 2025
    Artificial Intelligence

    Markus Buehler receives 2025 Washington Award

    May 15, 2025
    Leave A Reply Cancel Reply

    Hostinger

    Continue Reading

    Finally, a wireless microphone that effectively replaces my Shure shotgun when traveling

    News & Updates

    New hinge patent reveals Microsoft was exploring possible foldable flip smartphone designs

    Development

    BBC Data Breach: Over 25,000 Employee Records Compromised, Investigation Underway

    Development

    Sam Altman predicts superintelligence will trigger a 10x surge in AI breakthroughs — each year as revolutionary as a decade

    Development

    Highlights

    Development

    Handling Missing Request Data in Laravel

    February 4, 2025

    Learn how to handle missing request data in Laravel using missing() and whenMissing(). Discover elegant…

    Leaked KeyPlug Malware Infrastructure Contains Exploit Scripts to Hack Fortinet Firewall and VPN

    April 20, 2025

    Create Experiences with Multiple Layouts in Sitecore Personalize

    April 29, 2024

    Gpredict is a real-time satellite tracking and orbit prediction application

    April 22, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.