Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Artificial Intelligence»Despite its impressive output, generative AI doesn’t have a coherent understanding of the world

    Despite its impressive output, generative AI doesn’t have a coherent understanding of the world

    November 5, 2024

    Large language models can do impressive things, like write poetry or generate viable computer programs, even though these models are trained to predict words that come next in a piece of text.

    Such surprising capabilities can make it seem like the models are implicitly learning some general truths about the world.

    But that isn’t necessarily the case, according to a new study. The researchers found that a popular type of generative AI model can provide turn-by-turn driving directions in New York City with near-perfect accuracy — without having formed an accurate internal map of the city.

    Despite the model’s uncanny ability to navigate effectively, when the researchers closed some streets and added detours, its performance plummeted.

    When they dug deeper, the researchers found that the New York maps the model implicitly generated had many nonexistent streets curving between the grid and connecting far away intersections.

    This could have serious implications for generative AI models deployed in the real world, since a model that seems to be performing well in one context might break down if the task or environment slightly changes.

    “One hope is that, because LLMs can accomplish all these amazing things in language, maybe we could use these same tools in other parts of science, as well. But the question of whether LLMs are learning coherent world models is very important if we want to use these techniques to make new discoveries,” says senior author Ashesh Rambachan, assistant professor of economics and a principal investigator in the MIT Laboratory for Information and Decision Systems (LIDS).

    Rambachan is joined on a paper about the work by lead author Keyon Vafa, a postdoc at Harvard University; Justin Y. Chen, an electrical engineering and computer science (EECS) graduate student at MIT; Jon Kleinberg, Tisch University Professor of Computer Science and Information Science at Cornell University; and Sendhil Mullainathan, an MIT professor in the departments of EECS and of Economics, and a member of LIDS. The research will be presented at the Conference on Neural Information Processing Systems.

    New metrics

    The researchers focused on a type of generative AI model known as a transformer, which forms the backbone of LLMs like GPT-4. Transformers are trained on a massive amount of language-based data to predict the next token in a sequence, such as the next word in a sentence.

    But if scientists want to determine whether an LLM has formed an accurate model of the world, measuring the accuracy of its predictions doesn’t go far enough, the researchers say.

    For example, they found that a transformer can predict valid moves in a game of Connect 4 nearly every time without understanding any of the rules.

    So, the team developed two new metrics that can test a transformer’s world model. The researchers focused their evaluations on a class of problems called deterministic finite automations, or DFAs. 

    A DFA is a problem with a sequence of states, like intersections one must traverse to reach a destination, and a concrete way of describing the rules one must follow along the way.

    They chose two problems to formulate as DFAs: navigating on streets in New York City and playing the board game Othello.

    “We needed test beds where we know what the world model is. Now, we can rigorously think about what it means to recover that world model,” Vafa explains.

    The first metric they developed, called sequence distinction, says a model has formed a coherent world model it if sees two different states, like two different Othello boards, and recognizes how they are different. Sequences, that is, ordered lists of data points, are what transformers use to generate outputs.

    The second metric, called sequence compression, says a transformer with a coherent world model should know that two identical states, like two identical Othello boards, have the same sequence of possible next steps.

    They used these metrics to test two common classes of transformers, one which is trained on data generated from randomly produced sequences and the other on data generated by following strategies.

    Incoherent world models

    Surprisingly, the researchers found that transformers which made choices randomly formed more accurate world models, perhaps because they saw a wider variety of potential next steps during training. 

    “In Othello, if you see two random computers playing rather than championship players, in theory you’d see the full set of possible moves, even the bad moves championship players wouldn’t make,” Vafa explains.

    Even though the transformers generated accurate directions and valid Othello moves in nearly every instance, the two metrics revealed that only one generated a coherent world model for Othello moves, and none performed well at forming coherent world models in the wayfinding example.

    The researchers demonstrated the implications of this by adding detours to the map of New York City, which caused all the navigation models to fail.

    “I was surprised by how quickly the performance deteriorated as soon as we added a detour. If we close just 1 percent of the possible streets, accuracy immediately plummets from nearly 100 percent to just 67 percent,” Vafa says.

    When they recovered the city maps the models generated, they looked like an imagined New York City with hundreds of streets crisscrossing overlaid on top of the grid. The maps often contained random flyovers above other streets or multiple streets with impossible orientations.

    These results show that transformers can perform surprisingly well at certain tasks without understanding the rules. If scientists want to build LLMs that can capture accurate world models, they need to take a different approach, the researchers say.

    “Often, we see these models do impressive things and think they must have understood something about the world. I hope we can convince people that this is a question to think very carefully about, and we don’t have to rely on our own intuitions to answer it,” says Rambachan.

    In the future, the researchers want to tackle a more diverse set of problems, such as those where some rules are only partially known. They also want to apply their evaluation metrics to real-world, scientific problems.

    This work is funded, in part, by the Harvard Data Science Initiative, a National Science Foundation Graduate Research Fellowship, a Vannevar Bush Faculty Fellowship, a Simons Collaboration grant, and a grant from the MacArthur Foundation.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleCommunity News: Latest PECL Releases (11.05.2024)
    Next Article Broswarm secures €800,000 for game-changing landmine detection tech

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 16, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-47916 – Invision Community Themeeditor Remote Code Execution

    May 16, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    QTerminal – lightweight Qt-based terminal emulator

    Linux

    The Truth About Perfect Lighthouse Scores

    Development

    Appium Debugging: Common Techniques for Failed Tests

    Development

    DUDE – GUI utility for finding duplicated files

    Development

    Highlights

    CVE-2025-46729 – Julmud/phpDVDProfiler Cross-Site Scripting Vulnerability

    May 12, 2025

    CVE ID : CVE-2025-46729

    Published : May 12, 2025, 11:15 a.m. | 1 hour, 25 minutes ago

    Description : julmud/phpDVDProfiler is an adoption of the defunct phpDVDProfiler project, which allows users to display on the web their DVD collections maintained with Invelos’s DVDProfiler software. Starting in v_20230807 and prior to v_20250511, cross-site scripting in the search function. v_20250511 contains a patch for the issue.

    Severity: 0.0 | NA

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    CVE-2022-44760 – HCL Leap JavaScript Injection

    April 24, 2025

    Here’s all the Xbox games launching this week, from February 17 through 23

    February 16, 2025

    Best Free and Open Source Alternatives to Microsoft Calendar

    December 23, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.