Transforming Software Development with Multi-Agent Collaboration: CodeStoryâ€™s Aide Framework Sets State-of-the-Art on SWE-Bench-Lite with 40.3% Accepted Solutions

Recent developments in the field of software engineering have raised the bar for productivity and teamwork. A team of researchers from Codestory has recently developed a multi-agent coding framework called Aide that achieved a remarkable 40.3% accepted solutions on the SWE-Bench-Lite benchmark, establishing a new state-of-the-art. With its smooth integration into development environments and increased productivity, this framework promises to completely transform the way developers work with code.

https://aide.dev/blog/sota-on-swe-bench-lite

The idea of numerous agents, each in charge of a particular code symbol like a class, function, enum, or type, lies at the core of this architecture. This atomic level of granularity enables natural language communication amongst bots, enabling each to concentrate on a particular unit of task. The Language Server Protocol (LSP) facilitates the agentsâ€™ communication using protocols that guarantee accurate and effective information transmission.

Practically, this means that up to 30 agents can be active at once during a single run, collaborating to make decisions and sharing information. The frameworkâ€™s capabilities have been demonstrated by its remarkable performance on the SWE-Bench-Lite benchmark. ClaudeSonnet3.5 and GPT-4o were utilized in the creation of an editor environment for the agents through the use of Pyright and Jedi. GPT-4o was exceptional at code editing, while Sonnet3.5â€”which is renowned for its robust agentic behaviorsâ€”was helpful in organizing and navigating the codebase.

The agentic aspect of Sonnet 3.5 was very significant. It was the first paradigm to propose separating functions instead of making already complex ones more complex, exhibiting a sophisticated knowledge of maintainability and code structure. This behavior, along with GPT-4oâ€™s excellent code editing abilities, made the framework perform noticeably better than earlier versions.

The SWE-Bench-Lite benchmark was selected because it can replicate real-world coding difficulties, giving agents a reliable testing environment. The benchmark configuration comprised a mock editor harness with Pyright for diagnostics and Jinja for LSP features, enabling agents to obtain information and perform tests quickly without taxing system resources.

The benchmarking process yielded important lessons, one of which was the significance of agent collaboration. Together, agents who were each in charge of a different code symbol were able to do tasks quickly and often corrected unrelated problems like lint errors or TODOs as they went. This cooperative method not only enhanced the quality of the code but also demonstrated the ability of agentic systems to manage complicated coding jobs on their own.

The team has shared that there are still a few obstacles to overcome before fully including this multi-agent framework in development environments. Research is currently underway to ensure smooth communication between human developers and agents, handle concurrent code modifications, and preserve code stability. Furthermore, the team is studying to optimize the frameworkâ€™s performance better, specifically with inference speeds and intelligence costs.

The teamâ€™s ultimate objective is to increase the capabilities of human developers rather than to replace them. The goal is to improve software development process accuracy and efficiency by supplying a swarm of specialized agents, freeing up developers to work on more complex problems while the agents take care of more detailed duties.

The post Transforming Software Development with Multi-Agent Collaboration: CodeStoryâ€™s Aide Framework Sets State-of-the-Art on SWE-Bench-Lite with 40.3% Accepted Solutions appeared first on MarkTechPost.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

Minecraft licensing robbed us of this controversial NFL schedule release video

The power of generators

The power of generators

Simplify Factory Associations with Laravel’s UseFactory Attribute

This Week in Laravel: React Native, PhpStorm Junie, and more

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

Transforming Software Development with Multi-Agent Collaboration: CodeStoryâ€™s Aide Framework Sets State-of-the-Art on SWE-Bench-Lite with 40.3% Accepted Solutions

CVE-2025-40906 – MongoDB BSON Serialization BSON::XS Multiple Vulnerabilities

CVE-2025-4818 – SourceCodester Doctor’s Appointment System SQL Injection

CVE-2025-2811 – “GL.iNet Router Regular Expression Complexity Inefficient Vulnerability”

Learn Laravel and Vite : Processing Static Assets

Register Now for a Laravel Debugging Workshop by Sentry

How iFood built a platform to run hundreds of machine learning models with Amazon SageMaker Inference

(non) recensione AnduinOS

Xbox’s South of Midnight weaves a dark yet empathetic tale while showing why “that kind of representation matters”

As the Elden Ring DLC beats the snot out of players, Hidetaka Miyazaki says toning difficulty down would “break the game itself”

Certifications | A rocket fuel for growth

Transforming Software Development with Multi-Agent Collaboration: CodeStoryâ€™s Aide Framework Sets State-of-the-Art on SWE-Bench-Lite with 40.3% Accepted Solutions

Related Posts