Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      June 1, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 1, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 1, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 1, 2025

      7 MagSafe accessories that I recommend every iPhone user should have

      June 1, 2025

      I replaced my Kindle with an iPad Mini as my ebook reader – 8 reasons why I don’t regret it

      June 1, 2025

      Windows 11 version 25H2: Everything you need to know about Microsoft’s next OS release

      May 31, 2025

      Elden Ring Nightreign already has a duos Seamless Co-op mod from the creator of the beloved original, and it’ll be “expanded on in the future”

      May 31, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Student Record Android App using SQLite

      June 1, 2025
      Recent

      Student Record Android App using SQLite

      June 1, 2025

      When Array uses less memory than Uint8Array (in V8)

      June 1, 2025

      Laravel 12 Starter Kits: Definite Guide Which to Choose

      June 1, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Photobooth is photobooth software for the Raspberry Pi and PC

      June 1, 2025
      Recent

      Photobooth is photobooth software for the Raspberry Pi and PC

      June 1, 2025

      Le notizie minori del mondo GNU/Linux e dintorni della settimana nr 22/2025

      June 1, 2025

      Rilasciata PorteuX 2.1: Novità e Approfondimenti sulla Distribuzione GNU/Linux Portatile Basata su Slackware

      June 1, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Google DeepMind Achieves State-of-the-Art Data-Efficient Reinforcement Learning RL with Improved Transformer World Models

    Google DeepMind Achieves State-of-the-Art Data-Efficient Reinforcement Learning RL with Improved Transformer World Models

    February 5, 2025

    Reinforcement Learning RL trains agents to maximize rewards by interacting with an environment. Online RL alternates between taking actions, collecting observations and rewards, and updating policies using this experience. Model-free RL (MFRL) maps observations to actions but requires extensive data collection. Model-based RL (MBRL) mitigates this by learning a world model (WM) for planning in an imagined environment. Standard benchmarks like Atari-100k test sample efficiency, but their deterministic nature allows memorization rather than generalization. To encourage broader skills, researchers use Crafter, a 2D Minecraft-like environment. Craftax-classic, a JAX-based version, introduces procedural environments, partial observability, and a sparse reward system, requiring deep exploration.

    MBRL methods vary based on how WMs are used—for background planning (training policies with imagined data) or decision-time planning (conducting lookahead searches during inference). As seen in MuZero and EfficientZero, decision-time planning is effective but computationally expensive for large WMs like transformers. Background planning, originating from Dyna-Q learning, has been refined in deep RL models like Dreamer, IRIS, and DART. WMs also differ in generative ability; while non-generative WMs excel in efficiency, generative WMs better integrate real and imagined data. Many modern architectures use transformers, though recurrent state-space models like DreamerV2/3 remain relevant.

    Researchers from Google DeepMind introduce an advanced MBRL method that sets a new benchmark in the Craftax-classic environment, a complex 2D survival game requiring generalization, deep exploration, and long-term reasoning. Their approach achieves a 67.42% reward after 1M steps, surpassing DreamerV3 (53.2%) and human performance (65.0%). They enhance MBRL with a robust model-free baseline, “Dyna with warmup” for real and imagined rollouts, a nearest-neighbor tokenizer for patch-based image processing, and block teacher forcing for efficient token prediction. These refinements collectively improve sample efficiency, achieving state-of-the-art performance in data-efficient RL.

    The study enhances the MFRL baseline by expanding the model size and incorporating a Gated Recurrent Unit (GRU), increasing rewards from 46.91% to 55.49%. Additionally, the study introduces an MBRL approach using a Transformer World Model (TWM) with VQ-VAE quantization, achieving 31.93% rewards. To further optimize performance, a Dyna-based method integrates real and imagined rollouts, improving learning efficiency. Replacing VQ-VAE with a patch-wise nearest-neighbor tokenizer boosts performance from 43.36% to 58.92%. These advancements demonstrate the effectiveness of combining memory mechanisms, transformer-based models, and improved observation encoding in reinforcement learning.

    The study presents results from experiments on the Craftax-classic benchmark, conducted on 8 H100 GPUs over 1M steps. Each method collected 96-length trajectories in 48 parallel environments. For MBRL methods, imaginary rollouts were generated at 200k environment steps and updated 500 times. The “MBRL ladder” progression showed significant improvements, with the best agent (M5) achieving a 67.42% reward. Ablation studies confirmed the importance of each component, such as Dyna, NNT, patches, and BTF. Compared with existing methods, the best MBRL agent achieved a state-of-the-art performance. Additionally, Craftax Full experiments demonstrated generalization to harder environments.

    In conclusion, the study introduces three key improvements to vision-based MBRL agents using TWM for background planning. These enhancements include Dyna with warmup, patch nearest-neighbor tokenization, and block teacher forcing. The proposed MBRL agent performs better on the Craftax-classic benchmark, surpassing previous state-of-the-art models and human expert rewards. Future work includes exploring generalization beyond Craftax, prioritizing experience replay, integrating off-policy RL algorithms, and refining the tokenizer for large pre-trained models like SAM and Dino-V2. Additionally, the policy will be modified to accept latent tokens from non-reconstructive world models.


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 75k+ ML SubReddit.

    🚨 Recommended Open-Source AI Platform: ‘IntellAgent is a An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System’ (Promoted)

    The post Google DeepMind Achieves State-of-the-Art Data-Efficient Reinforcement Learning RL with Improved Transformer World Models appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleEnhancing LLM Capabilities with NeMo Guardrails on Amazon SageMaker JumpStart
    Next Article Build a multi-interface AI assistant using Amazon Q and Slack with Amazon CloudFront clickable references from an Amazon S3 bucket

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 1, 2025
    Machine Learning

    BOND 2025 AI Trends Report Shows AI Ecosystem Growing Faster than Ever with Explosive User and Developer Adoption

    June 1, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    How to Build a Multilingual Social Recipe Application with Flutter and Strapi

    How to Build a Multilingual Social Recipe Application with Flutter and Strapi

    Development

    CVE-2025-48880 – FreeScout Race Condition Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Yubico Warns of 2FA Security Flaw in pam-u2f for Linux and macOS Users

    Development

    WACK: Advancing Hallucination Detection by Identifying Knowledge-Based Errors in Language Models Through Model-Specific, High-Precision Datasets and Prompting Techniques

    Development

    Highlights

    British Prime Minister wants access to messaging apps

    April 9, 2025

    British Prime Minister David Cameron has stated his belief that encrypted messaging services must have…

    Founders, if you want A+ startup success, do the homework first!

    November 23, 2024

    6 Best Free and Open Source Econometric Software

    April 30, 2025

    CVE-2025-4712 – Campcodes Sales and Inventory System SQL Injection Vulnerability

    May 15, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.