Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      June 3, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 3, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 3, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 3, 2025

      SteelSeries reveals new Arctis Nova 3 Wireless headset series for Xbox, PlayStation, Nintendo Switch, and PC

      June 3, 2025

      The Witcher 4 looks absolutely amazing in UE5 technical presentation at State of Unreal 2025

      June 3, 2025

      Razer’s having another go at making it so you never have to charge your wireless gaming mouse, and this time it might have nailed it

      June 3, 2025

      Alienware’s rumored laptop could be the first to feature NVIDIA’s revolutionary Arm-based APU

      June 3, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      easy-live2d – About Make your Live2D as easy to control as a pixi sprite! Live2D Web SDK based on Pixi.js.

      June 3, 2025
      Recent

      easy-live2d – About Make your Live2D as easy to control as a pixi sprite! Live2D Web SDK based on Pixi.js.

      June 3, 2025

      From Kitchen To Conversion

      June 3, 2025

      Perficient Included in Forrester’s AI Technical Services Landscape, Q2 2025

      June 3, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      SteelSeries reveals new Arctis Nova 3 Wireless headset series for Xbox, PlayStation, Nintendo Switch, and PC

      June 3, 2025
      Recent

      SteelSeries reveals new Arctis Nova 3 Wireless headset series for Xbox, PlayStation, Nintendo Switch, and PC

      June 3, 2025

      The Witcher 4 looks absolutely amazing in UE5 technical presentation at State of Unreal 2025

      June 3, 2025

      Razer’s having another go at making it so you never have to charge your wireless gaming mouse, and this time it might have nailed it

      June 3, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»OS-Genesis: A Novel GUI Data Synthesis Pipeline that Reverses the Conventional Trajectory Collection Process

    OS-Genesis: A Novel GUI Data Synthesis Pipeline that Reverses the Conventional Trajectory Collection Process

    January 4, 2025

    Designing GUI agents that perform human-like tasks on graphical user interfaces faces a critical obstacle: collecting high-quality trajectory data for training. Existing methods depend on expensive and time-consuming human supervision or on generating synthetic data, which can hardly reflect the diversity and dynamics in the real world. Those constraints significantly limit the GUI agents’ scalability and effectiveness and prevent them from acting autonomously and adapting to diverse and dynamic environments.

    Traditional data acquisition for GUI agents is generally based on task-oriented methods. Human annotation is a labor-intensive process that involves designing tasks and annotating trajectories. Although synthetic data reduces the dependency on humans, it depends on pre-defined high-level tasks, which limit the scope and scale of the data. The presence of errors in the intermediate steps or conflicting objectives in the task results in incoherent trajectories and thus decreases the quality of the training data. As mentioned above, these restrictions limit the generalization abilities of agents to work effectively in dynamic or unfamiliar environments.

    Researchers from Shanghai AI Laboratory, The University of Hong Kong, Johns Hopkins University, Shanghai Jiao Tong University, the University of Oxford, and Hong Kong University of Science and Technology propose OS-Genesis, a groundbreaking strategy to address these challenges through interaction-driven reverse task synthesis. Unlike predetermined tasks, the exploratory mode of GUI agents involves interaction through clicks, scrolling, and typing over GUI elements for environments. In a retrospective analysis, these interactions are transformed into low-level instructions and contextualized as high-level tasks. It maintains data quality through a TRM, by scoring synthesized trajectories along dimensions of coherence, logical flow, and completeness. Even partial but meaningful data can be trained in such an approach. By bridging the gap between abstract instructions and the dynamic nature of GUIs, this framework significantly enhances the quality and diversity of training data while eliminating the need for human supervision.

    The OS-Genesis process consists of several integral components. First, the system autonomously explores dynamic GUI elements, recording transitions between pre- and post-action states to collect foundational data for task synthesis. These transitions are then transformed into detailed low-level instructions with the help of models like GPT-4o. Those instructions are incorporated into comprehensive high-level objectives related to the overall intention of the users, thereby attaining semantic depth. The synthesized pathways then undergo evaluation via the Trajectory Reward Model which uses a stratified scoring framework that focuses more on aspects of logical coherence as well as effective task completion. This ensures the diversity and high quality of data, thus providing a strong basis for training.

    Extensive experiments were conducted using benchmarks like AndroidWorld and WebArena, which mimic complex and dynamic environments. Vision-language models, namely Qwen2-VL and InternVL2, were used as the base frameworks for the training process. The training focused on improving both sophisticated task planning and precise low-level action execution to enable deep skill learning for GUI agents.

    OS-Genesis was successfully validated on a variety of benchmarks. On AndroidWorld, success rates nearly doubled those of task-driven methods regarding the ability to improve task planning and execution. On AndroidControl, the method performed very well at the high level of autonomous planning but also at the low level of step-by-step execution, including out-of-distribution examples; this shows robustness. On WebArena, the approach outperformed traditional baselines consistently, thereby gaining ground in handling complex and interactive environments. In summary, these results demonstrate the ability of OS-Genesis to generate high-quality trajectories of all sorts, thereby greatly improving the overall effectiveness of GUI agents in general situations.

    OS-Genesis is a revolutionary step in the training of GUI agents, as it overcomes the limitations of current data collection methods. Its interaction-driven methodology and reward-based evaluation ensure high-quality and diverse training data that bridge the gap between abstract task instructions and dynamic GUI environments. This approach opens the way for significant progress in digital automation and AI research by enabling GUI agents to learn and adapt autonomously.


    Check out the Paper, GitHub and Project Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

    🚨 FREE UPCOMING AI WEBINAR (JAN 15, 2025): Boost LLM Accuracy with Synthetic Data and Evaluation Intelligence–Join this webinar to gain actionable insights into boosting LLM model performance and accuracy while safeguarding data privacy.

    The post OS-Genesis: A Novel GUI Data Synthesis Pipeline that Reverses the Conventional Trajectory Collection Process appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleMeta AI Introduces EWE (Explicit Working Memory): A Novel Approach that Enhances Factuality in Long-Form Text Generation by Integrating a Working Memory
    Next Article REDA: A Novel AI Approach to Multi-Agent Reinforcement Learning That Makes Complex Sequence-Dependent Assignment Problems Solvable

    Related Posts

    Security

    Actively Exploited Qualcomm GPU Zero-Days Added to CISA’s KEV Catalog

    June 4, 2025
    Security

    HPE Issues Security Patch for StoreOnce Bug Allowing Remote Authentication Bypass

    June 4, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Jina AI Introduces Jina-CLIP v2: A 0.9B Multilingual Multimodal Embedding Model that Connects Image with Text in 89 Languages

    Development

    Single Agent Architectures (SSAs) and Multi-Agent Architectures (MAAs): Achieving Complex Goals, Including Enhanced Reasoning, Planning, and Tool Execution Capabilities

    Development

    CVE-2025-43557 – Animate Access of Uninitialized Pointer Arbitrary Code Execution Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Kodeco Podcast: The Power of Native Platforms (V2, S2, E11) [FREE]

    Development
    GetResponse

    Highlights

    Google Spoofed in Sophisticated DKIM Replay Attack Exploiting Email Trust Mechanisms

    April 21, 2025

    Google Spoofed in Sophisticated DKIM Replay Attack Exploiting Email Trust Mechanisms

    What if an email in your inbox looked exactly like it came from Google—passed all authentication checks, had no spelling errors, came from a Google domain, and even discussed a subpoena involving your …
    Read more

    Published Date:
    Apr 22, 2025 (1 hour, 50 minutes ago)

    Vulnerabilities has been mentioned in this article.

    CVE-2025-33028

    CVE-2023-42442

    Internet Explorer exploit could let phishers steal logins

    April 9, 2025

    CVE-2025-5176 – Realce Tecnologia Queue Ticket Kiosk SQL Injection Vulnerability

    May 26, 2025

    30+ Best Free Heavy & Ultra-Bold Fonts for Designers

    April 25, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.