Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»This AI Paper from Meta and MBZUAI Introduces a Principled AI Framework to Examine Highly Accurate Scaling Laws Concerning Model Size Versus Its Knowledge Storage Capacity

    This AI Paper from Meta and MBZUAI Introduces a Principled AI Framework to Examine Highly Accurate Scaling Laws Concerning Model Size Versus Its Knowledge Storage Capacity

    April 13, 2024

    Research on scaling laws for LLMs explores the relationship between model size, training time, and performance. While established principles suggest optimal training resources for a given model size, recent studies challenge these notions by showing that smaller models with more computational resources can outperform larger ones. Despite understanding emergent behaviors in large models, there needs to be more quantitative analysis on how model size affects its capacity post-sufficient training. Traditional theories propose that increasing model size improves memorization, generalization, and fitting complex functions, but practical outcomes often deviate due to overlooked factors.

    Researchers from Meta/FAIR Labs and Mohamed bin Zayed University of AI have devised a systematic framework to investigate the precise scaling laws governing the relationship between the size of LMs and their capacity to store knowledge. While it’s commonly assumed that larger models can hold more knowledge, the study aims to determine whether the total knowledge scales linearly with model size and what constant defines this scaling. Understanding this constant is pivotal for evaluating the efficiency of transformer models in knowledge storage and how various factors like architecture, quantization, and training duration impact this capacity. They train language models of varying sizes by defining knowledge as (name, attribute, value) tuples and generating synthetic datasets. They evaluate their knowledge storage efficiency by comparing trainable parameters to the minimum bits required to encode the knowledge.

    Language models store factual knowledge as tuples, each consisting of three strings: (name, attribute, and value). The study estimates the number of knowledge bits a language model can store, with findings indicating that models can store 2 bits of knowledge per parameter. Training duration, model architecture, quantization, sparsity constraints, and data signal-to-noise ratio impact a model’s knowledge storage capacity. Prepending training data with domain names like wikipedia.org significantly increases a model’s knowledge capacity by allowing models to identify and prioritize domains rich in knowledge.

    In the investigation, the researchers focus on factual knowledge represented as tuples, such as (USA, capital, Washington D.C.), and establish that language models can store approximately 2 bits of knowledge per parameter, even with quantization to int8. Moreover, they find that appending domain names to training data significantly enhances a model’s knowledge capacity, enabling language models to identify and prioritize domains rich in knowledge autonomously. Through controlled experiments, they elucidate how factors like training duration, architecture, quantization, sparsity constraints, and data signal-to-noise ratio affect a model’s knowledge storage capacity, offering valuable insights for developing and optimizing language models.

    The study outlines key findings on language model capacity:

    GPT2 consistently achieves a 2-bit per parameter capacity ratio across diverse data settings, implying a 7B model could exceed the knowledge in English Wikipedia.

    Longer training time, with 1000 exposures per knowledge piece, is crucial for maintaining this ratio.

    Model architecture influences capacity, with GPT2 outperforming LLaMA/Mistral due to gated MLP.

    Quantization to int8 maintains capacity, while int4 reduces it.

    Mixture-of-experts models slightly decrease capacity but remain efficient.

    Junk data significantly reduces model capacity, but prepending useful data mitigates this effect. This systematic approach offers precise comparisons of models and insights into critical aspects like training time, architecture, quantization, and data quality.

    In conclusion, researchers discovered a consistent pattern in investigating language model scaling laws: a fully-trained transformer model can effectively store 2 bits of knowledge per parameter, regardless of its size or other factors, such as quantization to int8. They explored the impact of various hyperparameters on these scaling laws, including training duration, model architectures, precision, and data quality. The methodology offers a rigorous framework for comparing model capabilities, aiding practitioners in decision-making regarding model selection and training. Moreover, the research lays the groundwork for addressing the fundamental question of optimal language model size, potentially informing future advancements toward achieving Artificial General Intelligence (AGI).

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 40k+ ML SubReddit

    Want to get in front of 1.5 Million AI Audience? Work with us here

    The post This AI Paper from Meta and MBZUAI Introduces a Principled AI Framework to Examine Highly Accurate Scaling Laws Concerning Model Size Versus Its Knowledge Storage Capacity appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleMeet Anterion: An Open-Source AI Software Engineer (SWE-Agent and OpenDevin)
    Next Article Eagle (RWKV-5) and Finch (RWKV-6): Marking Substantial Progress in Recurrent Neural Networks-Based Language Models by Integrating Multiheaded Matrix-Valued States and Dynamic Data-Driven Recurrence Mechanisms

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 17, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-40906 – MongoDB BSON Serialization BSON::XS Multiple Vulnerabilities

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Steps to improve your charity website’s navigation

    Web Development

    CVE-2025-31930 – Schneider Electric Modbus Remote Control Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    How Incremental Static Regeneration (ISR) Works in Next.js

    Development

    I’m a long-time Apple Watch user, but the Galaxy Ring beats it in one big way

    Development

    Highlights

    Development

    NSO Group Exploited WhatsApp to Install Pegasus Spyware Even After Meta’s Lawsuit

    November 18, 2024

    Legal documents released as part of an ongoing legal tussle between Meta’s WhatsApp and NSO…

    Iranian MuddyWater Hackers Adopt New C2 Tool ‘DarkBeatC2’ in Latest Campaign

    April 12, 2024

    How managing networks differs on Windows 10 and Linux

    December 27, 2024

    CVE-2025-4698 – PHPGurukul Directory Management System SQL Injection Vulnerability

    May 15, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.