Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Elastic simplifies log analytics for SREs and developers with launch of Log Essentials

      August 7, 2025

      OpenAI launches GPT-5

      August 7, 2025

      Melissa brings its data quality solutions to Azure with new SSIS integration

      August 7, 2025

      Automating Design Systems: Tips And Resources For Getting Started

      August 6, 2025

      This $180 mini projector has no business being this good for the price

      August 7, 2025

      GPT-5 is finally here, and you can access it for free today – no subscription needed

      August 7, 2025

      Changing this Android setting instantly doubled my phone speed (Samsung and Google models included)

      August 7, 2025

      ChatGPT can now talk nerdy to you – plus more personalities and other upgrades beyond GPT-5

      August 7, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Advanced Application Architecture through Laravel’s Service Container Management

      August 7, 2025
      Recent

      Advanced Application Architecture through Laravel’s Service Container Management

      August 7, 2025

      Switch Between Personas in Laravel With the MultiPersona Package

      August 7, 2025

      AI-Driven Smart Tagging and Metadata in AEM Assets

      August 7, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Bill Gates on AI’s Impact: ‘Be Curious, Read, and Use the Latest Tools’

      August 7, 2025
      Recent

      Bill Gates on AI’s Impact: ‘Be Curious, Read, and Use the Latest Tools’

      August 7, 2025

      Halo Infinite’s Fall Update: New Features and Modes to Revive the Game?

      August 7, 2025

      Forza Motorsport’s Future in Jeopardy: Fans Demand Clarity from Microsoft

      August 7, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»PoE-World + Planner Outperforms Reinforcement Learning RL Baselines in Montezuma’s Revenge with Minimal Demonstration Data

    PoE-World + Planner Outperforms Reinforcement Learning RL Baselines in Montezuma’s Revenge with Minimal Demonstration Data

    June 20, 2025

    The Importance of Symbolic Reasoning in World Modeling

    Understanding how the world works is key to creating AI agents that can adapt to complex situations. While neural network-based models, such as Dreamer, offer flexibility, they require massive amounts of data to learn effectively, far more than humans typically do. On the other hand, newer methods use program synthesis with large language models to generate code-based world models. These are more data-efficient and can generalize well from limited input. However, their use has been mostly limited to simple domains, such as text or grid worlds, as scaling to complex, dynamic environments remains a challenge due to the difficulty of generating large, comprehensive programs.

    Limitations of Existing Programmatic World Models

    Recent research has investigated the use of programs to represent world models, often leveraging large language models to synthesize Python transition functions. Approaches like WorldCoder and CodeWorldModels generate a single, large program, which limits their scalability in complex environments and their ability to handle uncertainty and partial observability. Some studies focus on high-level symbolic models for robotic planning by integrating visual input with abstract reasoning. Earlier efforts employed restricted domain-specific languages tailored to specific benchmarks or utilized conceptually related structures, such as factor graphs in Schema Networks. Theoretical models, such as AIXI, also explore world modeling using Turing machines and history-based representations.

    Introducing PoE-World: Modular and Probabilistic World Models

    Researchers from Cornell, Cambridge, The Alan Turing Institute, and Dalhousie University introduce PoE-World, an approach to learning symbolic world models by combining many small, LLM-synthesized programs, each capturing a specific rule of the environment. Instead of creating one large program, PoE-World builds a modular, probabilistic structure that can learn from brief demonstrations. This setup supports generalization to new situations, allowing agents to plan effectively, even in complex games like Pong and Montezuma’s Revenge. While it doesn’t model raw pixel data, it learns from symbolic object observations and emphasizes accurate modeling over exploration for efficient decision-making.

    Architecture and Learning Mechanism of PoE-World

    PoE-World models the environment as a combination of small, interpretable Python programs called programmatic experts, each responsible for a specific rule or behavior. These experts are weighted and combined to predict future states based on past observations and actions. By treating features as conditionally independent and learning from the full history, the model remains modular and scalable. Hard constraints refine predictions, and experts are updated or pruned as new data is collected. The model supports planning and reinforcement learning by simulating likely future outcomes, enabling efficient decision-making. Programs are synthesized using LLMs and interpreted probabilistically, with expert weights optimized via gradient descent.

    Empirical Evaluation on Atari Games

    The study evaluates their agent, PoE-World + Planner, on Atari’s Pong and Montezuma’s Revenge, including harder, modified versions of these games. Using minimal demonstration data, their method outperforms baselines such as PPO, ReAct, and WorldCoder, particularly in low-data settings. PoE-World demonstrates strong generalization by accurately modeling game dynamics, even in altered environments without new demonstrations. It’s also the only method to consistently score positively in Montezuma’s Revenge. Pre-training policies in PoE-World’s simulated environment accelerate real-world learning. Unlike WorldCoder’s limited and sometimes inaccurate models, PoE-World produces more detailed, constraint-aware representations, leading to better planning and more realistic in-game behavior.

    Conclusion: Symbolic, Modular Programs for Scalable AI Planning

    In conclusion, understanding how the world works is crucial to building adaptive AI agents; however, traditional deep learning models require large datasets and struggle to update flexibly with limited input. Inspired by how humans and symbolic systems recombine knowledge, the study proposes PoE-World. This method utilizes large language models to synthesize modular, programmatic “experts” that represent different parts of the world. These experts combine compositionally to form a symbolic, interpretable world model that supports strong generalization from minimal data. Tested on Atari games like Pong and Montezuma’s Revenge, this approach demonstrates efficient planning and performance, even in unfamiliar scenarios. Code and demos are publicly available.


    Check out the Paper, Project Page and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

    The post PoE-World + Planner Outperforms Reinforcement Learning RL Baselines in Montezuma’s Revenge with Minimal Demonstration Data appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticlePhonetically-Augmented Discriminative Rescoring for Voice Search Error Correction
    Next Article Scaling Laws for Unsupervised Finetuning of LLMs

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    August 7, 2025
    Machine Learning

    Google DeepMind Introduces Genie 3: A General Purpose World Model that can Generate an Unprecedented Diversity of Interactive Environments

    August 7, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2025-2517 – OpenText ArcSight Enterprise Security Manager Domain Reference Leak

    Common Vulnerabilities and Exposures (CVEs)

    WordPress AI Engine Plugin Bug Allows Remote Code Execution – Update Now

    Development

    CVE-2025-50201 – WeGIA Web Manager OS Command Injection Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-5712 – SourceCodester Open Source Clinic Management System SQL Injection

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    CVE-2025-5093 – WordPress Responsive Lightbox & Gallery Stored Cross-Site Scripting Vulnerability

    June 27, 2025

    CVE ID : CVE-2025-5093

    Published : June 27, 2025, 6:15 a.m. | 53 minutes ago

    Description : The Responsive Lightbox & Gallery WordPress plugin before 2.5.2 use the Swipebox library which does not validate and escape title attributes before outputting them back in a page/post where used, which could allow users with the contributor role and above to perform Stored Cross-Site Scripting attacks.

    Severity: 0.0 | NA

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    Scattered Spider Hacker Arrests Halt Attacks, But Copycat Threats Sustain Security Pressure

    July 31, 2025

    CVE-2025-49384 – Trend Micro Security Link Following Privilege Escalation Vulnerability

    June 17, 2025

    CVE-2025-37816 – Linux Kernel Mei VSC Buffer Overflow Vulnerability

    May 8, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.