Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Error’d: Pickup Sticklers

      September 27, 2025

      From Prompt To Partner: Designing Your Custom AI Assistant

      September 27, 2025

      Microsoft unveils reimagined Marketplace for cloud solutions, AI apps, and more

      September 27, 2025

      Design Dialects: Breaking the Rules, Not the System

      September 27, 2025

      Building personal apps with open source and AI

      September 12, 2025

      What Can We Actually Do With corner-shape?

      September 12, 2025

      Craft, Clarity, and Care: The Story and Work of Mengchu Yao

      September 12, 2025

      Cailabs secures €57M to accelerate growth and industrial scale-up

      September 12, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Using phpinfo() to Debug Common and Not-so-Common PHP Errors and Warnings

      September 28, 2025
      Recent

      Using phpinfo() to Debug Common and Not-so-Common PHP Errors and Warnings

      September 28, 2025

      Mastering PHP File Uploads: A Guide to php.ini Settings and Code Examples

      September 28, 2025

      The first browser with JavaScript landed 30 years ago

      September 27, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured
      Recent
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Artificial Intelligence»Can large language models figure out the real world?

    Can large language models figure out the real world?

    August 26, 2025

    Back in the 17th century, German astronomer Johannes Kepler figured out the laws of motion that made it possible to accurately predict where our solar system’s planets would appear in the sky as they orbit the sun. But it wasn’t until decades later, when Isaac Newton formulated the universal laws of gravitation, that the underlying principles were understood. Although they were inspired by Kepler’s laws, they went much further, and made it possible to apply the same formulas to everything from the trajectory of a cannon ball to the way the moon’s pull controls the tides on Earth — or how to launch a satellite from Earth to the surface of the moon or planets.

    Today’s sophisticated artificial intelligence systems have gotten very good at making the kind of specific predictions that resemble Kepler’s orbit predictions. But do they know why these predictions work, with the kind of deep understanding that comes from basic principles like Newton’s laws? As the world grows ever-more dependent on these kinds of AI systems, researchers are struggling to try to measure just how they do what they do, and how deep their understanding of the real world actually is.

    Now, researchers in MIT’s Laboratory for Information and Decision Systems (LIDS) and at Harvard University have devised a new approach to assessing how deeply these predictive systems understand their subject matter, and whether they can apply knowledge from one domain to a slightly different one. And by and large the answer at this point, in the examples they studied, is — not so much.

    The findings were presented at the International Conference on Machine Learning, in Vancouver, British Columbia, last month by Harvard postdoc Keyon Vafa, MIT graduate student in electrical engineering and computer science and LIDS affiliate Peter G. Chang, MIT assistant professor and LIDS principal investigator Ashesh Rambachan, and MIT professor, LIDS principal investigator, and senior author Sendhil Mullainathan.

    “Humans all the time have been able to make this transition from good predictions to world models,” says Vafa, the study’s lead author. So the question their team was addressing was, “have foundation models — has AI — been able to make that leap from predictions to world models? And we’re not asking are they capable, or can they, or will they. It’s just, have they done it so far?” he says.

    “We know how to test whether an algorithm predicts well. But what we need is a way to test for whether it has understood well,” says Mullainathan, the Peter de Florez Professor with dual appointments in the MIT departments of Economics and Electrical Engineering and Computer Science and the senior author on the study. “Even defining what understanding means was a challenge.”

    In the Kepler versus Newton analogy, Vafa says, “they both had models that worked really well on one task, and that worked essentially the same way on that task. What Newton offered was ideas that were able to generalize to new tasks.” That capability, when applied to the predictions made by various AI systems, would entail having it develop a world model so it can “transcend the task that you’re working on and be able to generalize to new kinds of problems and paradigms.”

    Another analogy that helps to illustrate the point is the difference between centuries of accumulated knowledge of how to selectively breed crops and animals, versus Gregor Mendel’s insight into the underlying laws of genetic inheritance.

    “There is a lot of excitement in the field about using foundation models to not just perform tasks, but to learn something about the world,” for example in the natural sciences, he says. “It would need to adapt, have a world model to adapt to any possible task.”

    Are AI systems anywhere near the ability to reach such generalizations? To test the question, the team looked at different examples of predictive AI systems, at different levels of complexity. On the very simplest of examples, the systems succeeded in creating a realistic model of the simulated system, but as the examples got more complex that ability faded fast.

    The team developed a new metric, a way of measuring quantitatively how well a system approximates real-world conditions. They call the measurement inductive bias — that is, a tendency or bias toward responses that reflect reality, based on inferences developed from looking at vast amounts of data on specific cases.

    The simplest level of examples they looked at was known as a lattice model. In a one-dimensional lattice, something can move only along a line. Vafa compares it to a frog jumping between lily pads in a row. As the frog jumps or sits, it calls out what it’s doing — right, left, or stay. If it reaches the last lily pad in the row, it can only stay or go back. If someone, or an AI system, can just hear the calls, without knowing anything about the number of lily pads, can it figure out the configuration? The answer is yes: Predictive models do well at reconstructing the “world” in such a simple case. But even with lattices, as you increase the number of dimensions, the systems no longer can make that leap.

    “For example, in a two-state or three-state lattice, we showed that the model does have a pretty good inductive bias toward the actual state,” says Chang. “But as we increase the number of states, then it starts to have a divergence from real-world models.”

    A more complex problem is a system that can play the board game Othello, which involves players alternately placing black or white disks on a grid. The AI models can accurately predict what moves are allowable at a given point, but it turns out they do badly at inferring what the overall arrangement of pieces on the board is, including ones that are currently blocked from play.

    The team then looked at five different categories of predictive models actually in use, and again, the more complex the systems involved, the more poorly the predictive modes performed at matching the true underlying world model.

    With this new metric of inductive bias, “our hope is to provide a kind of test bed where you can evaluate different models, different training approaches, on problems where we know what the true world model is,” Vafa says. If it performs well on these cases where we already know the underlying reality, then we can have greater faith that its predictions may be useful even in cases “where we don’t really know what the truth is,” he says.

    People are already trying to use these kinds of predictive AI systems to aid in scientific discovery, including such things as properties of chemical compounds that have never actually been created, or of potential pharmaceutical compounds, or for predicting the folding behavior and properties of unknown protein molecules. “For the more realistic problems,” Vafa says, “even for something like basic mechanics, we found that there seems to be a long way to go.”

    Chang says, “There’s been a lot of hype around foundation models, where people are trying to build domain-specific foundation models — biology-based foundation models, physics-based foundation models, robotics foundation models, foundation models for other types of domains where people have been collecting a ton of data” and training these models to make predictions, “and then hoping that it acquires some knowledge of the domain itself, to be used for other downstream tasks.”

    This work shows there’s a long way to go, but it also helps to show a path forward. “Our paper suggests that we can apply our metrics to evaluate how much the representation is learning, so that we can come up with better ways of training foundation models, or at least evaluate the models that we’re training currently,” Chang says. “As an engineering field, once we have a metric for something, people are really, really good at optimizing that metric.”

    Source: Read MoreÂ

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleNew technologies tackle brain health assessment for the military
    Next Article One Manus AI: How Srinidhi Ranganathan Commands the Power of 100,000+ Employees with Manus

    Related Posts

    Repurposing Protein Folding Models for Generation with Latent Diffusion
    Artificial Intelligence

    Repurposing Protein Folding Models for Generation with Latent Diffusion

    September 14, 2025
    Artificial Intelligence

    Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment

    September 14, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Physicality: the new age of UI

    Web Development

    Coded Smorgasbord: High Strung

    Tech & Work

    Wish You Were Here – Win a Free Ticket to Penpot Fest 2025!

    News & Updates

    Classify call center conversations with Amazon Bedrock batch inference

    Machine Learning

    Highlights

    CVE-2025-23121 Remote Code Execution in Veeam

    June 19, 2025

    CVE-2025-23121 Remote Code Execution in Veeam

    📌 OverviewCVE-2025-23121 is a critical remote code execution (RCE) vulnerability identified in Veeam Backup & Replication (VBR) software. The flaw affects domain-joined backup servers and allows any a …
    Read more

    Published Date:
    Jun 19, 2025 (5 hours, 31 minutes ago)

    Vulnerabilities has been mentioned in this article.

    CVE-2025-24287

    CVE-2025-24286

    CVE-2025-23121

    CVE-2025-26685

    CVE-2025-2783

    CVE-2024-29212

    The AI Product Development Lifecycle: From Concept to Commercialization🚀

    July 16, 2025

    Pixelwall – Web Development and Marketing

    September 14, 2025

    Microsoft’s July 2025 Patch Tuesday: 140 Flaws Fixed, Including Zero-Day, RCEs & AMD CPU Threats

    July 9, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.