Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Microsoft donates DocumentDB to the Linux Foundation

      August 25, 2025

      A Week In The Life Of An AI-Augmented Designer

      August 22, 2025

      This week in AI updates: Gemini Code Assist Agent Mode, GitHub’s Agents panel, and more (August 22, 2025)

      August 22, 2025

      Microsoft adds Copilot-powered debugging features for .NET in Visual Studio

      August 21, 2025

      ChatGPT is reportedly scraping Google Search data to answer your questions – here’s how

      August 26, 2025

      The 10 best early Labor Day deals live now: Save on Apple, Samsung and more

      August 26, 2025

      5 rumored Apple iPhone Fold features that have me excited (and frustrated at the same time)

      August 26, 2025

      Forget plug-and-play AI: Here’s what successful AI projects do differently

      August 26, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Log Outgoing HTTP Requests with the Laravel Spy Package

      August 26, 2025
      Recent

      Log Outgoing HTTP Requests with the Laravel Spy Package

      August 26, 2025

      devdojo/auth

      August 26, 2025

      Rust Slices: Cutting Into References the Safe Way

      August 26, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Best AI Girlfriend Simulator [2025 Working Apps and Websites]

      August 25, 2025
      Recent

      Best AI Girlfriend Simulator [2025 Working Apps and Websites]

      August 25, 2025

      8 Best Paid and Free AI Sexting Chat Apps in 2025

      August 25, 2025

      Best AI Anime Art Generator: 7 Best to Use [Free & Premium]

      August 25, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Artificial Intelligence»Can large language models figure out the real world?

    Can large language models figure out the real world?

    August 26, 2025

    Back in the 17th century, German astronomer Johannes Kepler figured out the laws of motion that made it possible to accurately predict where our solar system’s planets would appear in the sky as they orbit the sun. But it wasn’t until decades later, when Isaac Newton formulated the universal laws of gravitation, that the underlying principles were understood. Although they were inspired by Kepler’s laws, they went much further, and made it possible to apply the same formulas to everything from the trajectory of a cannon ball to the way the moon’s pull controls the tides on Earth — or how to launch a satellite from Earth to the surface of the moon or planets.

    Today’s sophisticated artificial intelligence systems have gotten very good at making the kind of specific predictions that resemble Kepler’s orbit predictions. But do they know why these predictions work, with the kind of deep understanding that comes from basic principles like Newton’s laws? As the world grows ever-more dependent on these kinds of AI systems, researchers are struggling to try to measure just how they do what they do, and how deep their understanding of the real world actually is.

    Now, researchers in MIT’s Laboratory for Information and Decision Systems (LIDS) and at Harvard University have devised a new approach to assessing how deeply these predictive systems understand their subject matter, and whether they can apply knowledge from one domain to a slightly different one. And by and large the answer at this point, in the examples they studied, is — not so much.

    The findings were presented at the International Conference on Machine Learning, in Vancouver, British Columbia, last month by Harvard postdoc Keyon Vafa, MIT graduate student in electrical engineering and computer science and LIDS affiliate Peter G. Chang, MIT assistant professor and LIDS principal investigator Ashesh Rambachan, and MIT professor, LIDS principal investigator, and senior author Sendhil Mullainathan.

    “Humans all the time have been able to make this transition from good predictions to world models,” says Vafa, the study’s lead author. So the question their team was addressing was, “have foundation models — has AI — been able to make that leap from predictions to world models? And we’re not asking are they capable, or can they, or will they. It’s just, have they done it so far?” he says.

    “We know how to test whether an algorithm predicts well. But what we need is a way to test for whether it has understood well,” says Mullainathan, the Peter de Florez Professor with dual appointments in the MIT departments of Economics and Electrical Engineering and Computer Science and the senior author on the study. “Even defining what understanding means was a challenge.” 

    In the Kepler versus Newton analogy, Vafa says, “they both had models that worked really well on one task, and that worked essentially the same way on that task. What Newton offered was ideas that were able to generalize to new tasks.” That capability, when applied to the predictions made by various AI systems, would entail having it develop a world model so it can “transcend the task that you’re working on and be able to generalize to new kinds of problems and paradigms.”

    Another analogy that helps to illustrate the point is the difference between centuries of accumulated knowledge of how to selectively breed crops and animals, versus Gregor Mendel’s insight into the underlying laws of genetic inheritance.

    “There is a lot of excitement in the field about using foundation models to not just perform tasks, but to learn something about the world,” for example in the natural sciences, he says. “It would need to adapt, have a world model to adapt to any possible task.”

    Are AI systems anywhere near the ability to reach such generalizations? To test the question, the team looked at different examples of predictive AI systems, at different levels of complexity. On the very simplest of examples, the systems succeeded in creating a realistic model of the simulated system, but as the examples got more complex that ability faded fast.

    The team developed a new metric, a way of measuring quantitatively how well a system approximates real-world conditions. They call the measurement inductive bias — that is, a tendency or bias toward responses that reflect reality, based on inferences developed from looking at vast amounts of data on specific cases.

    The simplest level of examples they looked at was known as a lattice model. In a one-dimensional lattice, something can move only along a line. Vafa compares it to a frog jumping between lily pads in a row. As the frog jumps or sits, it calls out what it’s doing — right, left, or stay. If it reaches the last lily pad in the row, it can only stay or go back. If someone, or an AI system, can just hear the calls, without knowing anything about the number of lily pads, can it figure out the configuration? The answer is yes: Predictive models do well at reconstructing the “world” in such a simple case. But even with lattices, as you increase the number of dimensions, the systems no longer can make that leap.

    “For example, in a two-state or three-state lattice, we showed that the model does have a pretty good inductive bias toward the actual state,” says Chang. “But as we increase the number of states, then it starts to have a divergence from real-world models.”

    A more complex problem is a system that can play the board game Othello, which involves players alternately placing black or white disks on a grid. The AI models can accurately predict what moves are allowable at a given point, but it turns out they do badly at inferring what the overall arrangement of pieces on the board is, including ones that are currently blocked from play.

    The team then looked at five different categories of predictive models actually in use, and again, the more complex the systems involved, the more poorly the predictive modes performed at matching the true underlying world model.

    With this new metric of inductive bias, “our hope is to provide a kind of test bed where you can evaluate different models, different training approaches, on problems where we know what the true world model is,” Vafa says. If it performs well on these cases where we already know the underlying reality, then we can have greater faith that its predictions may be useful even in cases “where we don’t really know what the truth is,” he says.

    People are already trying to use these kinds of predictive AI systems to aid in scientific discovery, including such things as properties of chemical compounds that have never actually been created, or of potential pharmaceutical compounds, or for predicting the folding behavior and properties of unknown protein molecules. “For the more realistic problems,” Vafa says, “even for something like basic mechanics, we found that there seems to be a long way to go.”

    Chang says, “There’s been a lot of hype around foundation models, where people are trying to build domain-specific foundation models — biology-based foundation models, physics-based foundation models, robotics foundation models, foundation models for other types of domains where people have been collecting a ton of data” and training these models to make predictions, “and then hoping that it acquires some knowledge of the domain itself, to be used for other downstream tasks.”

    This work shows there’s a long way to go, but it also helps to show a path forward. “Our paper suggests that we can apply our metrics to evaluate how much the representation is learning, so that we can come up with better ways of training foundation models, or at least evaluate the models that we’re training currently,” Chang says. “As an engineering field, once we have a metric for something, people are really, really good at optimizing that metric.”

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleNew technologies tackle brain health assessment for the military
    Next Article One Manus AI: How Srinidhi Ranganathan Commands the Power of 100,000+ Employees with Manus

    Related Posts

    Artificial Intelligence

    Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment

    August 26, 2025
    Repurposing Protein Folding Models for Generation with Latent Diffusion
    Artificial Intelligence

    Repurposing Protein Folding Models for Generation with Latent Diffusion

    August 26, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2025-23970 – Aonetheme Service Finder Booking Privilege Escalation

    Common Vulnerabilities and Exposures (CVEs)

    Setting Up Terraform: Installation and Configuration

    Linux

    CVE-2025-4809 – Tenda AC7 Stack-Based Buffer Overflow Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Early View Data Preparation with Laravel View Creators

    Development

    Highlights

    memegen generates memes from the command line

    April 13, 2025

    memegen is a POSIX script to generate memes from the command line. It’s written in…

    15 Best Free and Open Source Linux News Aggregators

    April 4, 2025

    FOSS Weekly #25.14: Fedora 42 COSMIC, OnePackage, AppImage Tools and More Linux Stuff

    April 3, 2025

    CVE-2025-6920 – ai-inference-server API Key Validation Bypass Vulnerability

    July 1, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.