Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 22, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 22, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 22, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 22, 2025

      Sam Altman says ChatGPT’s viral Ghibli effect “forced OpenAI to do a lot of unnatural things”

      May 22, 2025

      How to get started with Microsoft Copilot on Windows 11

      May 22, 2025

      Microsoft blocks employees from sending emails that mention “Palestine” or “Gaza”

      May 22, 2025

      I missed out on the Clair Obscur: Expedition 33 Collector’s Edition but thankfully, the developers are launching something special

      May 22, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Perficient is Shaping the Future of Salesforce Innovation

      May 22, 2025
      Recent

      Perficient is Shaping the Future of Salesforce Innovation

      May 22, 2025

      Opal – Optimizely’s AI-Powered Marketing Assistant

      May 22, 2025

      Content Compliance Without the Chaos: How Optimizely CMP Empowers Financial Services Marketers

      May 22, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Sam Altman says ChatGPT’s viral Ghibli effect “forced OpenAI to do a lot of unnatural things”

      May 22, 2025
      Recent

      Sam Altman says ChatGPT’s viral Ghibli effect “forced OpenAI to do a lot of unnatural things”

      May 22, 2025

      How to get started with Microsoft Copilot on Windows 11

      May 22, 2025

      Microsoft blocks employees from sending emails that mention “Palestine” or “Gaza”

      May 22, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»This AI Paper Introduces PLAN-AND-ACT: A Modular Framework for Long-Horizon Planning in Web-Based Language Agents

    This AI Paper Introduces PLAN-AND-ACT: A Modular Framework for Long-Horizon Planning in Web-Based Language Agents

    March 26, 2025

    Large language models are powering a new wave of digital agents to handle sophisticated web-based tasks. These agents are expected to interpret user instructions, navigate interfaces, and execute complex commands in ever-changing environments. The difficulty lies not in understanding language but in translating that understanding into precise, sequenced actions while adapting to dynamic contexts. Success for long-horizon tasks like booking travel or retrieving specific web data depends on managing a sequence of steps that evolves with each action. Despite major progress in language capabilities, creating agents that can effectively plan and adapt at each step remains an unsolved problem.

    Composing broad goals into actionable steps is a major issue in building such agents. When a user requests “follow the top contributor of this GitHub project,” the agent must interpret the command and determine how to navigate to the contributor’s section, identify the relevant person, and initiate the following action. This task becomes even more complex in dynamic environments where content may shift between executions. Without a clear planning and updating strategy, agents can make inconsistent decisions or fail entirely. The scarcity of training data that shows how to plan and execute long tasks correctly adds another layer of difficulty.

    Previously, researchers attempted to address these issues with models that either relied on single-agent strategies or applied reinforcement learning to guide actions. Single-agent systems like ReAct attempted to merge reasoning and execution but often faltered as the model was overwhelmed by thinking and acting at once. Reinforcement learning approaches showed promise but proved unstable and highly sensitive to environment-specific tuning. Collecting training data for these methods required extensive interaction with environments, making it time-consuming and impractical to scale. These methods also struggled to maintain performance consistency when tasks changed mid-process.

    Researchers from UC Berkeley, the University of Tokyo, and ICSI introduced a new PLAN-AND-ACT system. Companies like Apple, Nvidia, Microsoft, and Intel supported the work. This framework splits task planning and execution into two modules: a PLANNER and an EXECUTOR. The PLANNER is tasked with creating a structured plan based on the user’s request, essentially outlining what steps need to be taken. The EXECUTOR then translates each step into environment-specific actions. By separating these responsibilities, the system allows the PLANNER to focus on strategy while the EXECUTOR handles execution, improving the reliability of both components. This modular design marks a significant shift from previous approaches.

    The methodology behind PLAN-AND-ACT is detailed and focuses heavily on scalable training. Since human-annotated planning data is limited, researchers introduced a synthetic data generation pipeline. They began by collecting action trajectories from simulated agents—sequences of clicks, inputs, and responses. Large language models then analyzed these trajectories to reconstruct high-level plans grounded in actual outcomes. For example, a plan might specify identifying the top contributor, while the actions linked to it include clicking the “Contributors” tab and parsing the resulting HTML. The team expanded their dataset with 10,000 additional synthetic plans and then generated 5,000 more targeted plans based on failure analysis. This synthetic training method saved time and produced high-quality data that reflected real execution needs.

    Hostinger

    In testing, PLAN-AND-ACT achieved a task success rate of 53.94% on the WebArena-Lite benchmark, surpassing the previous best result of 49.1% from WebRL. Without any planner, a base executor only achieved 9.85%. Adding a non-finetuned planner boosted performance to 29.63% while finetuning on 10,000 synthetic plans brought results up to 44.24%. Incorporating dynamic replanning added a final 10.31% performance gain. Across all experiments, the data showed that most performance improvements came from enhancing the PLANNER rather than the EXECUTOR. Even with a base EXECUTOR, having a strong PLANNER led to substantial success rate increases, validating the researchers’ hypothesis that separating planning and execution yields better task outcomes.

    In conclusion, this paper highlights how identifying the gap between goal understanding and environment interaction can lead to more effective AI systems. By focusing on structured planning and scalable data generation, the researchers proposed a method that solves a specific problem and demonstrates a framework that can extend to broader applications. PLAN-AND-ACT shows that effective planning, not just execution, is critical to AI agent success in complex environments.


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.

    The post This AI Paper Introduces PLAN-AND-ACT: A Modular Framework for Long-Horizon Planning in Web-Based Language Agents appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleAmazon SageMaker JumpStart adds fine-tuning support for models in a private model hub
    Next Article Generative AI-powered game design: Accelerating early development with Stability AI models on Amazon Bedrock

    Related Posts

    Machine Learning

    Researchers from the National University of Singapore Introduce ‘Thinkless,’ an Adaptive Framework that Reduces Unnecessary Reasoning by up to 90% Using DeGRPO

    May 23, 2025
    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    May 22, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Laravel Cookie Consent

    Development

    Researchers from Dataocean AI and Tsinghua University Introduces Dolphin: A Multilingual Automatic Speech Recognition ASR Model Optimized for Eastern Languages and Dialects

    Machine Learning

    CVE-2025-37983 – Linux kernel qibfs Dentry Leak

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-32310 – ThemeMove QuickCal CSRF Privilege Escalation

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    Samsung Galaxy Unpacked July 2024 dates confirmed. More AI smarts are expected?

    June 27, 2024

    Samsung confirmed the dates of Galaxy Unpacked event, held in Paris on July 10. More…

    SAP Update Addresses Critical Vulnerabilities That Enable System Takeover by Hackers

    August 14, 2024

    Profanify

    February 17, 2025

    Accelerate Mixtral 8x7B pre-training with expert parallelism on Amazon SageMaker

    May 23, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.