Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 17, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 17, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 17, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 17, 2025

      Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

      May 17, 2025

      If you think you can do better than Xbox or PlayStation in the Console Wars, you may just want to try out this card game

      May 17, 2025

      Surviving a 10 year stint in dev hell, this retro-styled hack n’ slash has finally arrived on Xbox

      May 17, 2025

      Save $400 on the best Samsung TVs, laptops, tablets, and more when you sign up for Verizon 5G Home or Home Internet

      May 17, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      NodeSource N|Solid Runtime Release – May 2025: Performance, Stability & the Final Update for v18

      May 17, 2025
      Recent

      NodeSource N|Solid Runtime Release – May 2025: Performance, Stability & the Final Update for v18

      May 17, 2025

      Big Changes at Meteor Software: Our Next Chapter

      May 17, 2025

      Apps in Generative AI – Transforming the Digital Experience

      May 17, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

      May 17, 2025
      Recent

      Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

      May 17, 2025

      If you think you can do better than Xbox or PlayStation in the Console Wars, you may just want to try out this card game

      May 17, 2025

      Surviving a 10 year stint in dev hell, this retro-styled hack n’ slash has finally arrived on Xbox

      May 17, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Optimizing Agent Planning: A Parametric AI Approach to World Knowledge

    Optimizing Agent Planning: A Parametric AI Approach to World Knowledge

    May 27, 2024

    Large Language Models (LLMs) have advanced natural language processing tasks significantly. Recently, using LLMs for physical world planning tasks has shown promise. However, LLMs, primarily autoregressive models, often fail to understand the real world, leading to hallucinatory actions and trial-and-error behavior. Unlike LLMs, humans utilize global task knowledge and local state knowledge to mentally rehearse and execute tasks efficiently, avoiding blind trial-and-error and confusion during the planning and execution stages.

    Existing work in LLM-based agent systems focuses on agent planning, external tool utilization, and code generation, often fine-tuning open-source LLMs. These approaches may lead to trial-and-error actions due to a lack of environmental cognition. Knowledge-augmented agent planning, using pre-trained knowledge or structured prompts, faces challenges in transferring across tasks. 

    Inspired by the human approach to planning, researchers from Zhejiang University – Ant Group Joint Laboratory of Knowledge Graph, National University of Singapore, and Alibaba Group developed a parametric World Knowledge Model (WKM) for agent planning. WKM is built on knowledge from both expert and explored trajectories. The agent model synthesizes task knowledge by comparing these trajectories and summarizes state knowledge for each planning step. This knowledge is integrated into expert trajectories to train the WKM. During planning, WKM provides global task knowledge and maintains dynamic state knowledge, guiding the agent and preventing hallucinatory actions through kNN retrieval and weighted predictions.

    The agent model self-synthesizes task knowledge by comparing expert and sampled trajectories. An experienced agent generates high-quality rejected trajectories, enhancing task knowledge beyond supervised fine-tuning. Task knowledge guides global planning, avoiding blind trial-and-error. State knowledge, summarized at each planning step from expert trajectories, constrains local planning to prevent hallucinatory actions. A state knowledge base, formed by combining state knowledge with preceding and subsequent actions, facilitates retrieval without overloading the context, ensuring effective and accurate agent planning.

    The method is evaluated on ALFWorld, WebShop, and ScienceWorld datasets, with unseen tasks testing generalization. ALFWorld uses binary rewards, while WebShop and ScienceWorld use dense rewards. The models tested include Mistral-7B, Gemma-7B, and Llama-3-8B, compared against prompt-based baselines (REACT, Reflexion), fine-tuning baselines (NAT, ETO), KNOWAGENT, and ChatGPT/GPT-4. The approach, through LoRA training alone, surpasses GPT-4 on ALFWorld (44.29→73.57 on seen, 38.05→76.87 on unseen) and WebShop (62.76→66.64), and fine-tuning baselines, demonstrating that integrating world knowledge is more effective than further fine-tuning on negative examples. WKM shows superior performance and generalization compared to human-designed knowledge methods like KNOWAGENT.

    This research develops a parametric WKM to enhance language agent model planning. The WKM provides task knowledge for global planning and state knowledge for local planning. Results show WKM’s superior performance on GPT-4 and state-of-the-art models, outperforming strong baselines. Analytical experiments demonstrate WKM’s ability to reduce trial-and-error, improve generalization to unseen tasks, achieve weak-guide-strong, and extend to unified world knowledge training. 

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 43k+ ML SubReddit

    The post Optimizing Agent Planning: A Parametric AI Approach to World Knowledge appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleThis AI Study from MIT Proposes a Significant Refinement to the simple one-dimensional linear representation hypothesis
    Next Article A Comprehensive Review of Survey on Efficient Multimodal Large Language Models

    Related Posts

    Development

    February 2025 Baseline monthly digest

    May 17, 2025
    Development

    Learn A1 Level Spanish

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    The browser for web developers

    Development

    CVE-2025-4359 – iSourcecode Gym Management System SQL Injection Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    New Golang-Based Backdoor Uses Telegram Bot API for Evasive C2 Operations

    Development

    ASUS releases fix for AMI bug that lets hackers brick servers

    Security

    Highlights

    Development

    The Xbox Series X Mini Fridge is more than a meme, it’s the perfect Christmas gift

    December 20, 2024

    The Xbox Series X Mini Fridge can keep eight cans and some snacks cool. It’s…

    Navigating the Landscape of CLIP: Investigating Data, Architecture, and Training Strategies

    April 18, 2024

    Capture and diagnose I/O bottlenecks on Amazon RDS for SQL Server

    November 11, 2024

    Aligning AI with human values

    February 4, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.