Large Language Models (LLMs) have revolutionized software engineering, demonstrating remarkable capabilities in various coding tasks. While recent efforts have produced autonomous software agents based on LLMs for end-to-end development tasks, these systems are typically designed for specific Software Engineering (SE) tasks. Researchers from FPT Software AI Center, Viet Nam, introduce HyperAgent, a novel generalist multi-agent system designed to address a wide spectrum of SE tasks across different programming languages by mimicking human developers’ workflows.
HyperAgent comprises four specialized agents—Planner, Navigator, Code Editor, and Executor—managing the full lifecycle of SE tasks, from initial conception to final verification. Through extensive evaluations, HyperAgent demonstrates competitive performance across diverse SE tasks:
GitHub issue resolution: 25.01% success rate on SWE-Bench-Lite and 31.40% on SWE-Bench-Verified, competitive performance compared to existing methods, such as AutoCodeRover, SWE-Agent, Agentless, etc.
Code generation at repository scale (RepoExec): 53.3% accuracy when navigating through codebases and retrieving correct context.
Fault localization and program repair (Defects4J): 59.70% accuracy in fault localization and successful fixes for 29.8% of Defects4J bugs, achieved SOTA performance on these 2 tasks.
This work represents a significant advancement towards versatile, autonomous agents capable of handling complex, multi-step SE tasks across various domains and languages. HyperAgent’s performance demonstrates its potential to transform AI-assisted software development practices, offering a more adaptable and comprehensive solution than task-specific alternatives.
Methodology
HyperAgent is inspired by typical developer workflows to solve any software engineering task, it consists of four iterative phases in the typical software engineering workflow: Analysis & Plan, where developers understand requirements and formulate a flexible strategy; Feature Localization, which involves identifying relevant code components in the repository; Edition, where developers implement changes, add functionality, and write tests while maintaining code quality; and Execution, which includes testing and verification of the modifications. These phases are repeated as necessary until the task is completed satisfactorily, with the process adapting to the specific task requirements and the developer’s expertise.
In HyperAgent, the framework is organized around four primary agents: Planner, Navigator, Code Editor, and Executor. Each agent corresponds to a specific step in the overall workflow, though the actual workflow of each agent may differ slightly from how a human developer might approach similar tasks.
The design emphasizes three main advantages over existing methods:
Generalizability: The framework is designed to easily adapt to a wide range of tasks with minimal configuration changes and little additional effort required to implement new modules into the system.
Efficiency: Each agent is optimized to manage processes with varying levels of complexity, requiring different degrees of intelligence from LLMs. For example, a lightweight and computationally efficient LLM can be employed for navigation, which, while less complex, involves the highest token consumption. Conversely, more complex tasks, such as code editing or execution, require more advanced LLM capabilities.
Scalability: The framework is built to scale effectively when deployed in real-world scenarios where the number of subtasks is significantly large. For instance, a complex task in the SWE-bench benchmark may require considerable time for an agent-based system to complete, and HyperAgent is designed to handle such scenarios efficiently.
These advantages allow HyperAgent to effectively tackle a broad spectrum of software engineering tasks while maintaining efficiency and scalability.
Conclusion
HyperAgent is a generalist multi-agent system designed to address a wide range of software engineering tasks. By closely mimicking typical software engineering workflows, HyperAgent incorporates stages for analysis, planning, feature localization, code editing, and execution/verification. Extensive evaluations across diverse benchmarks, including GitHub issue resolution, code generation at repository-level scale, and fault localization and program repair, demonstrate that HyperAgent not only matches but often exceeds the performance of specialized systems. The success of HyperAgent highlights the potential of generalist approaches in software engineering, offering a versatile tool that can adapt to various tasks with minimal configuration changes. Its design emphasizes generalizability, efficiency, and scalability, making it well-suited for real-world software development scenarios where tasks can vary significantly in complexity and scope.
Future work could explore integrating HyperAgent with existing development environments and version control systems, investigating its potential in specialized domains like security-focused code review or performance optimization, enhancing its explainability, and continually updating its knowledge base. These advancements could further streamline the software engineering process, expand HyperAgent’s applicability, improve trust among developers, and ensure its long-term relevance in the rapidly evolving field of software engineering.
Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..
Don’t Forget to join our 48k+ ML SubReddit
Find Upcoming AI Webinars here
Thanks to FPT Software AI Center for the thought leadership/ Resources for this article. FPT Software AI Center has supported us in this content/article.
The post FPT Software AI Center Introduces HyperAgent: A Groundbreaking Generalist Agent System to Resolve Various Software Engineering Tasks at Scale, Achieving SOTA Performance on SWE-Bench and Defects4J appeared first on MarkTechPost.
Source: Read MoreÂ