Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 30, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 30, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 30, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 30, 2025

      Does Elden Ring Nightreign have crossplay or cross-platform play?

      May 30, 2025

      Cyberpunk 2077 sequel enters pre-production as Phantom Liberty crosses 10 million copies sold

      May 30, 2025

      EA has canceled yet another game, shuttered its developer, and started more layoffs

      May 30, 2025

      The Witcher 3: Wild Hunt reaches 60 million copies sold as work continues on The Witcher 4

      May 30, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      How Remix is shaking things up

      May 30, 2025
      Recent

      How Remix is shaking things up

      May 30, 2025

      Perficient at Kscope25: Let’s Meet in Texas!

      May 30, 2025

      Salesforce + Informatica: What It Means for Data Cloud and Our Customers

      May 30, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Does Elden Ring Nightreign have crossplay or cross-platform play?

      May 30, 2025
      Recent

      Does Elden Ring Nightreign have crossplay or cross-platform play?

      May 30, 2025

      Cyberpunk 2077 sequel enters pre-production as Phantom Liberty crosses 10 million copies sold

      May 30, 2025

      EA has canceled yet another game, shuttered its developer, and started more layoffs

      May 30, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Mobile-Agent-E: A Hierarchical Multi-Agent Framework Combining Cognitive Science and AI to Redefine Complex Task Handling on Smartphones

    Mobile-Agent-E: A Hierarchical Multi-Agent Framework Combining Cognitive Science and AI to Redefine Complex Task Handling on Smartphones

    January 24, 2025

    Smartphones are essential tools in dAIly life. However, the complexity of tasks on mobile devices often leads to frustration and inefficiency. Navigating applications and managing multi-step processes consumes time and effort. Advancements in AI have introduced large multimodal models (LMMs) that enable mobile assistants to perform intricate operations autonomously. While these innovations aim to simplify technology, they often fail to meet practical demands. Addressing these gaps requires advanced AI capabilities and adaptable systems.

    Current mobile assistants struggle to handle complex tasks requiring long-term planning, reasoning, and adaptability. Tasks like creating itineraries or comparing prices involve multiple steps across platforms. These systems treat each task as isolated, lacking the ability to learn from experience or optimize performance for repeated tasks, leading to inefficiency. Also, allocating identical resources to all tasks, regardless of complexity, reduces effectiveness in demanding scenarios. 

    Some frameworks address these challenges but remain limited in planning and decision-making. Current mobile agents like AppAgent and Mobile-Agent-v1 focus on short, predefined tasks. Systems like Mobile-Agent-v2, despite improved planning, fail to incorporate a hierarchical structure for effective task delegation and refinement. These limitations highlight the need for more advanced mobile assistant designs.

    Researchers from the University of Illinois Urbana-Champaign and Alibaba Group have developed Mobile-Agent-E, a novel mobile assistant that addresses these challenges through a hierarchical multi-agent framework. The system features a Manager agent responsible for planning and breaking down tasks into sub-goals, supported by four subordinate agents: Perceptor, Operator, Action Reflector, and Notetaker. These agents specialize in visual perception, immediate action execution, error verification, and information aggregation. A standout feature of Mobile-Agent-E is its self-evolution module, which includes a long-term memory system. This memory is divided into two components: 

    1. Tips, which provide generalized guidance based on previous tasks
    2. Shortcuts, which are reusable sequences of operations tailored to specific recurring subroutines

    Mobile-Agent-E operates by continuously refining its performance through feedback loops. After completing each task, the system’s Experience Reflectors update its Tips and propose new Shortcuts based on interaction history. These updates are inspired by human cognitive processes, where episodic memory informs future decisions, and procedural knowledge facilitates efficient task execution. For example, if a user frequently performs a sequence of actions, such as searching for a location and creating a note, the system creates a Shortcut to streamline this process in the future. Mobile-Agent-E balances high-level planning and low-level action precision by incorporating these learnings into its hierarchical framework.

    The performance of Mobile-Agent-E has been tested using a new benchmark called Mobile-Eval-E, which evaluates the system’s ability to handle complex real-world tasks. Compared to existing models, Mobile-Agent-E achieves significantly higher satisfaction scores, with a 15% increase in task completion rates. Also, evolved Tips and Shortcuts reduce computational overhead, enabling faster task execution without compromising accuracy. For instance, a single Shortcut that combines actions like “Tap,” “Type,” and “Enter” can save two decision-making iterations, improving efficiency. The system’s hierarchical design enhances error recovery, allowing it to adapt to unforeseen challenges during task execution.

    Key takeaways from this research include the following:  

    1. Mobile-Agent-E features a Manager agent supported by four specialized subordinate agents, enabling efficient task delegation and execution.  
    2. The system continuously updates its Tips and Shortcuts, inspired by human cognitive processes, to improve performance and reduce redundant errors.
    3. Shortcuts reduce computational overhead, resulting in faster task execution with fewer resources. For example, task completion time decreased by 20% compared to previous models.
    4. Mobile-Agent-E achieved a 15% increase in satisfaction scores compared to state-of-the-art models, demonstrating its effectiveness in real-world applications.
    5. The system’s capabilities extend to various scenarios, such as planning itineraries, managing notes, and comparing prices across apps, showcasing its versatility and adaptability. 

    In conclusion, Mobile-Agent-E bridges the gap between user needs and technological capabilities by addressing critical challenges in task management, planning, and decision-making. Its hierarchical framework and self-evolution capabilities enhance efficiency and set a new benchmark for intelligent mobile assistants. This research highlights the potential of AI-driven solutions to transform human-device interaction, making technology more accessible and intuitive for all users.


    Check out the Paper, GitHub Page and Project Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 70k+ ML SubReddit.

    🚨 [Recommended Read] Nebius AI Studio expands with vision models, new language models, embeddings and LoRA (Promoted)

    The post Mobile-Agent-E: A Hierarchical Multi-Agent Framework Combining Cognitive Science and AI to Redefine Complex Task Handling on Smartphones appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleO1-Pruner: Streamlining Long-Thought Reasoning in Language Models
    Next Article Google AI Introduces Learn-by-Interact: A Data-Centric Framework for Adaptive and Efficient LLM Agent Development

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    May 30, 2025
    Machine Learning

    World-Consistent Video Diffusion With Explicit 3D Modeling

    May 30, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Microsoft’s Copilot Vision is now free for all Edge users – here’s how it works

    News & Updates

    Google could turn its Discover into an AI podcast as if our attention spans weren’t already terribly depleted

    Operating Systems

    Google’s Geocoding APIs: Frontend and Backend Implementation

    Development

    CVE-2025-47201 – Intrexx Portal Server Cross-Site Scripting (XSS)

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    Development

    The Future of Offshore Software Development: A Beginner’s Guide

    April 18, 2024

    By the year 2024, Offshore Software Development will offer a major change in how firms…

    How to Build AI Software: A Complete Guide for Founders

    January 8, 2025

    Meet EvaByte: An Open-Source 6.5B State-of-the-Art Tokenizer-Free Language Model Powered by EVA

    January 22, 2025

    “Touch Grass without touching grass” with these hilarious (and very real) skins for Xbox, Steam Deck, laptop, phone, and more

    April 1, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.