Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»WebDreamer: Enhancing Web Navigation Through LLM-Powered Model-Based Planning

    WebDreamer: Enhancing Web Navigation Through LLM-Powered Model-Based Planning

    November 24, 2024

    Strategic planning in artificial intelligence has reached significant milestones, especially in achieving superhuman performance in complex games like Go. Large Language Models (LLMs) integrated with advanced planning algorithms have shown remarkable improvements in complex reasoning tasks. However,  several critical challenges emerge when these capabilities are applied to web-based environments for executing complex tasks across diverse websites. The primary concerns include safety risks during live website interactions, such as accidental submission of sensitive information or unintended transactions. The irreversible nature of many online actions, like purchase confirmations or email sending, poses significant obstacles to implementing traditional planning algorithms that rely on backtracking capabilities.

    Various approaches have emerged to tackle web-based planning challenges. One approach is Reactive agents that make decisions based on immediate observations without future action simulation by implementing the ReAct framework. These agents have evolved through prompting closed-source models, training on HTML and webpage screenshots, and improving element grounding through action-coordinate pair data. Next, Tree search-based approaches like Search Agent and AgentQ utilize best-first tree search and Monte Carlo Tree Search (MCTS), to allow exploration and multi-step planning. Lastly, the World models, offer another approach by predicting future states and rewards, but need task-specific training and focus primarily on improving data efficiency in agent learning.

    Researchers from Ohio State University and Orby AI have proposed WEBDREAMER, a method to enhance language agents with model-based planning by utilizing LLMs as world models in web environments. It uses LLMs’ inherent knowledge of website structures and functionalities to simulate outcomes for each candidate action (e.g., “What would happen if I click this button?”), using natural language descriptions. This simulation-based approach allows the system to evaluate different possibilities and select the optimal action at each step. By using LLMs as world models, WEBDREAMER introduces a technique for automated web interaction to address the safety, and irreversibility challenges in traditional planning methods.

    WEBDREAMER utilizes complex planning through simulation architecture that operates in multiple stages. Initially, the system generates candidate actions using a two-stage approach: sampling top-k actions and then utilizing an LLM to self-refine and eliminate unnecessary options for simulation. WEBDREAMER simulates potential two-step trajectories and employs the LLM for both simulation and scoring functions for each candidate action. This dual functionality enables the system to predict and evaluate outcomes effectively. The process continues until a termination condition is reached, which could be triggered by a stop action, maximum steps reached, or action repetition beyond three times. This architecture ensures thorough exploration while maintaining efficiency through selective action refinement.

    WEBDREAMER demonstrates significant performance improvements across multiple benchmarks, achieving a 33.3% relative performance outperforming Reactive agents on the VWA dataset. On the Mind2Web-live dataset, the improvement is a more modest 13.1%, largely due to the dataset’s low discriminative power, as shown by minimal differences in performance across base LLMs. Although WEBDREAMER’s overall success rate falls slightly below tree-search baselines, it offers a more practical solution for real-world website interactions. Moreover, researchers conducted a more granular analysis comparing the proposed method to the reactive baseline on the VWA dataset across multiple dimensions.

    In conclusion, researchers introduced WEBDREAMER, a method that utilizes LLMs as world models for planning in complex web environments and represents a significant advancement in AI-driven web navigation. WEBDREAMER demonstrates significant improvements compared to reactive baselines, offering greater practicality than traditional tree search methods. However, this method faces two primary limitations: the relative simplicity of its planning algorithm and considerable computational costs, with each task on VWA requiring approximately $1 using GPT-4. These challenges highlight opportunities for future research to optimize LLM efficiency and develop more advanced, cost-effective planning algorithms for handling long-horizon tasks.


    Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

    [FREE AI VIRTUAL CONFERENCE] SmallCon: Free Virtual GenAI Conference ft. Meta, Mistral, Salesforce, Harvey AI & more. Join us on Dec 11th for this free virtual event to learn what it takes to build big with small models from AI trailblazers like Meta, Mistral AI, Salesforce, Harvey AI, Upstage, Nubank, Nvidia, Hugging Face, and more.

    The post WebDreamer: Enhancing Web Navigation Through LLM-Powered Model-Based Planning appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleUncovering How Vision Transformers Understand Object Relations: A Two-Stage Approach to Visual Reasoning
    Next Article Training-Free Guidance (TFG): A Unified Machine Learning Framework Transforming Conditional Generation in Diffusion Models with Enhanced Efficiency and Versatility Across Domains

    Related Posts

    Machine Learning

    Salesforce AI Releases BLIP3-o: A Fully Open-Source Unified Multimodal Model Built with CLIP Embeddings and Flow Matching for Image Understanding and Generation

    May 16, 2025
    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 16, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Using natural language in Amazon Q Business: From searching and creating ServiceNow incidents and knowledge articles to generating insights

    Development

    Not able to Launch Edge with Remote Driver can some one pls help

    Development

    4DDiG Windows Boot Genius Review: Is It Worth The Price Tag?

    Development

    My favorite Mac note-taking app just got a major AI upgrade

    News & Updates

    Highlights

    Development

    Wired’s Kevin Kelly on Technology, AI, and the Power of Learning

    April 23, 2025

    From Exploration to Integration When the co-founder and “Senior Maverick” at Wired magazine, Kevin Kelly,…

    Telegram Founder Pavel Durov Arrested in France for Content Moderation Failures

    August 29, 2024

    LlamaParse: An API by LlamaIndex to Efficiently Parse and Represent Files for Efficient Retrieval and Context Augmentation Using LlamaIndex Frameworks

    June 2, 2024

    Ransomware attacks on critical infrastructure surge, reports FBI

    April 29, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.