Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Error’d: You Talkin’ to Me?

      September 20, 2025

      The Psychology Of Trust In AI: A Guide To Measuring And Designing For User Confidence

      September 20, 2025

      This week in AI updates: OpenAI Codex updates, Claude integration in Xcode 26, and more (September 19, 2025)

      September 20, 2025

      Report: The major factors driving employee disengagement in 2025

      September 20, 2025

      DistroWatch Weekly, Issue 1140

      September 21, 2025

      Distribution Release: DietPi 9.17

      September 21, 2025

      Development Release: Zorin OS 18 Beta

      September 19, 2025

      Distribution Release: IPFire 2.29 Core 197

      September 19, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      @ts-ignore is almost always the worst option

      September 22, 2025
      Recent

      @ts-ignore is almost always the worst option

      September 22, 2025

      MutativeJS v1.3.0 is out with massive performance gains

      September 22, 2025

      Student Performance Prediction System using Python Machine Learning (ML)

      September 21, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      DistroWatch Weekly, Issue 1140

      September 21, 2025
      Recent

      DistroWatch Weekly, Issue 1140

      September 21, 2025

      Distribution Release: DietPi 9.17

      September 21, 2025

      Hyprland Made Easy: Preconfigured Beautiful Distros

      September 20, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Meet CoAct-1: A Novel Multi-Agent System that Synergistically Combines GUI-based Control with Direct Programmatic Execution

    Meet CoAct-1: A Novel Multi-Agent System that Synergistically Combines GUI-based Control with Direct Programmatic Execution

    August 7, 2025

    A Team of researchers from USC, Salesforce AI and University of Washington have introduced CoAct-1, a pioneering multi-agent computer-using agent (CUA) that marks a significant leap in autonomous computer operation. By elevating coding to a first-class action—on par with traditional GUI manipulation—CoAct-1 overcomes longstanding challenges of efficiency and reliability in complex, long-horizon computer tasks. On the demanding OSWorld benchmark, CoAct-1 sets a new gold standard, achieving a state-of-the-art (SOTA) success rate of 60.76%, making it the first CUA agent to surpass the 60% mark.

    Why CoAct-1? Bridging the Efficiency Gap in Computer-Using Agents

    Conventional CUA agents rely solely on pixel-based GUI interaction—emulating human users by clicking, typing, and navigating interfaces. While this approach mimics user workflows, it proves fragile and inefficient for intricate, multi-step tasks, especially those involving dense UI layouts, multi-app pipelines, or complex OS operations. Single errors such as a mis-click can derail entire workflows, and sequence lengths balloon as tasks increase in complexity.

    Efforts to mitigate these issues have included augmenting GUI agents with high-level planners, as seen in systems like GTA-1 and modular multi-agent frameworks. However, these methods cannot escape the bottleneck of GUI-centric action spaces, ultimately limiting both efficiency and robustness.

    CoAct-1: Hybrid Architecture with Coding as Action

    CoAct-1 takes a fundamentally different approach by integrating three specialized agents:

    • Orchestrator: The high-level planner that decomposes complex tasks and dynamically delegates each subtask either to the Programmer or the GUI Operator based on task requirements.
    • Programmer: Executes backend operations—file management, data processing, environment configuration—directly via Python or Bash scripts, bypassing cumbersome GUI action sequences.
    • GUI Operator: Uses a vision-language model to interact with visual interfaces when human-like UI navigation is indispensable.

    This hybrid model enables CoAct-1 to strategically substitute brittle and lengthy mouse-keyboard operations with concise, reliable code execution, while still leveraging GUI interactions where necessary.

    Evaluation on OSWorld: Record-Setting Performance

    OSWorld—a leading benchmark featuring 369 tasks spanning office productivity, IDEs, browsers, file managers, and multi-app workflows—proves an exacting testbed for agentic systems. Each task mirrors real-world language goals and is assessed by a granular rule-based scoring system.

    Results

    • Overall SOTA Success Rate: CoAct-1 achieves 60.76% on the 100+ step category—the first CUA agent to cross the 60-point threshold. This outpaces GTA-1 (53.10%), OpenAI CUA 4o (31.40%), UI-TARS-1.5 (29.60%), and other leading frameworks.
    • Stepped Allowance Performance: At a 100-step budget, CoAct-1 scores 59.93%, again leading all competitors.
    • Efficiency: Completes tasks with an average of 10.15 steps per successful task, compared to 15.22 for GTA-1, 14.90 for UI-TARS, and with much higher success than OpenAI CUA 4o, which, despite fewer steps (6.14), achieves only 31.40% success.

    Breakdown

    CoAct-1 dominates across task types, with especially large gains in workflows benefitting from code execution:

    • Multi-App: 47.88% (vs. GTA-1’s 38.34%)
    • OS Tasks: 75.00%
    • VLC: 66.07%
    • In productivity and IDE domains (LibreOffice Calc, Writer, VSCode), it consistently leads or ties with the SOTA.

    Key Insights: What Drives CoAct-1’s Gains?

    • Coding Actions Replace Redundant GUI Sequences: For operations like batch image resizing or advanced file manipulations, single scripts replace dozens of error-prone clicks, reducing both steps and risk of failure.
    • Dynamic Delegation: The Orchestrator’s flexible task assignment ensures optimal use of coding vs. GUI actions.
    • Improvement with Stronger Backbones: The best configuration uses OpenAI CUA 4o for the GUI Operator, OpenAI o3 for the Orchestrator, and o4-mini for the Programmer, reaching the top 60.76% score. Systems using only smaller or less capable backbones score significantly lower.
    • Efficiency Correlates with Reliability: Fewer steps directly reduce opportunities for error—the single strongest predictor of successful completion.

    Conclusion: A Leap Forward in Generalized Computer Automation

    By making coding a first-class system action alongside GUI manipulation, CoAct-1 delivers both a quantum leap in success and efficiency, and illustrates the practical path forward for scalable, reliable autonomous computer agents. Its hybrid architecture and dynamic execution logic set a new high-water mark for the CUA field, heralding robust advances in real-world computer automation.


    Check out the Paper and Technical details. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

    The post Meet CoAct-1: A Novel Multi-Agent System that Synergistically Combines GUI-based Control with Direct Programmatic Execution appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleHow to Free Up and Automatically Manage Disk Space for WSL on Windows 10/11
    Next Article NVIDIA XGBoost 3.0: Training Terabyte-Scale Datasets with Grace Hopper Superchip

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    September 3, 2025
    Machine Learning

    Announcing the new cluster creation experience for Amazon SageMaker HyperPod

    September 3, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2025-52896 – Frappe Cross-Site Scripting (XSS) via Data Import Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    How to Build RAG AI Agents with TypeScript

    Development

    Updates to Apple’s On-Device and Server Foundation Language Models

    Machine Learning

    Are cybercriminals hacking your systems – or just logging in?

    Development

    Highlights

    Linux

    Rilasciato il Browser Web Open Source Mozilla Firefox 142

    August 19, 2025

    Firefox è un browser web open source sviluppato da Mozilla, un’organizzazione no-profit che si impegna…

    CVE-2024-41195 – Ocuco Innovation INNOVASERVICEINTF.EXE Privilege Escalation Remote Authentication Bypass

    May 22, 2025

    CVE-2025-43555 – Animate Integer Underflow Allows Arbitrary Code Execution

    May 13, 2025
    Boost team productivity with Amazon Q Business Insights

    Boost team productivity with Amazon Q Business Insights

    April 9, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.