Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Error’d: Pickup Sticklers

      September 27, 2025

      From Prompt To Partner: Designing Your Custom AI Assistant

      September 27, 2025

      Microsoft unveils reimagined Marketplace for cloud solutions, AI apps, and more

      September 27, 2025

      Design Dialects: Breaking the Rules, Not the System

      September 27, 2025

      Building personal apps with open source and AI

      September 12, 2025

      What Can We Actually Do With corner-shape?

      September 12, 2025

      Craft, Clarity, and Care: The Story and Work of Mengchu Yao

      September 12, 2025

      Cailabs secures €57M to accelerate growth and industrial scale-up

      September 12, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Using phpinfo() to Debug Common and Not-so-Common PHP Errors and Warnings

      September 28, 2025
      Recent

      Using phpinfo() to Debug Common and Not-so-Common PHP Errors and Warnings

      September 28, 2025

      Mastering PHP File Uploads: A Guide to php.ini Settings and Code Examples

      September 28, 2025

      The first browser with JavaScript landed 30 years ago

      September 27, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured
      Recent
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»SWE-Bench Performance Reaches 50.8% Without Tool Use: A Case for Monolithic State-in-Context Agents

    SWE-Bench Performance Reaches 50.8% Without Tool Use: A Case for Monolithic State-in-Context Agents

    May 18, 2025

    Recent advancements in LM agents have shown promising potential for automating intricate real-world tasks. These agents typically operate by proposing and executing actions through APIs, supporting applications such as software engineering, robotics, and scientific experimentation. As these tasks become more complex, LM agent frameworks have evolved to include multiple agents, multi-step retrieval, and tailored scaffolding to optimize performance. A central challenge lies in effectively exploring and understanding the environment, which has prompted the development of engineered scaffolds using tools, memory mechanisms, and custom pipelines. However, most existing methods assume partial observability, requiring agents to collect observations incrementally. While this assumption holds in dynamic or unfamiliar environments, it is less applicable in fully observable settings like SWE-bench, where all relevant information is accessible from the start.

    In software engineering, research on LM agents has focused on two main strategies: agent-based frameworks and structured pipelines. Agent-based systems, such as SWE-Agent and OpenHands CodeAct, allow LMs to interact autonomously with codebases, often through custom interfaces and retrieval tools. Other models like Moatless and AutoCodeRover enhance localization through search techniques, while SpecRover refines scaffolding design. Alternatively, structured pipelines—such as Agentless and CodeMonkey—decompose tasks into sequential phases like localization, repair, and validation. While these approaches depend on engineered components for performance, the current study proposes leveraging Long-Context LMs (LCLMs) to directly interpret the entire task environment. Advances in LCLM architecture and infrastructure now allow these models to outperform retrieval-augmented systems in many contexts, reducing reliance on complex external scaffolding. 

    Researchers from Stanford, IBM, and the University of Toronto explored whether complex scaffolding is necessary for LM agents tackling tasks like SWE-bench. They show that simply using LCLMs, such as Gemini-1.5-Pro, with proper prompting and no scaffolding, can achieve competitive performance—reaching 38% on SWE-Bench-Verified. Gemini-2.5-Pro, using the same simple setup, reaches 50.8%. Their work suggests that many complex agentic designs could be replaced with a single powerful LCLM, simplifying architecture and training. Additionally, a hybrid two-stage approach using Gemini-1.5-Pro and Claude-3.7 achieves a 48.6% solve rate, further supporting this simplified direction. 

    Traditional LM agents rely on interactive exploration due to partial observability, but many tasks, like software debugging, allow full observability. The study proposes state-in-context agents that leverage LCLMs to directly process full or compressed environment states, bypassing the need for complex agentic scaffolding. For large codebases, a ranking-based compression selects relevant files to fit within context limits. Two methods are introduced: DIRECTSOLVE, where LCLMs solve tasks using the full context; and SELECTSOLVE, where LCLMs localize relevant files for short-context LMs (SCLMs) to solve. Both use targeted patch formats and validation to ensure accuracy and reduce hallucination. 

    The experiments evaluate a simplified agent framework using LLMs on the SWE-bench Verified benchmark, which includes 500 real-world software engineering tasks. The proposed methods, DIRECTSOLVE and SELECTSOLVE, utilize LCLMs like Gemini-1.5-Pro and Gemini-2.5-Pro, and in SELECTSOLVE, an additional SCLM (Claude-3.7-Sonnet) for patch generation. Results show that DIRECTSOLVE outperforms complex agentic approaches like Agentless and CodeAct with minimal engineering. SELECTSOLVE further improves accuracy by leveraging stronger models for patching. Ablation studies highlight the importance of CoT prompting, code restatement, and token-efficient context design. Additionally, positioning relevant files at the start of the prompt improves performance, underscoring limitations in long-context processing. 

    In conclusion, the cost of using LCLM-based methods is currently higher than existing approaches like Agentless and CodeAct, averaging $2.60 per instance compared to $0.25 and $0.87, respectively. However, rapid drops in inference costs and increasing context lengths make LCLMs more practical. Techniques like KV caching significantly lower costs after initial runs, reducing it to about $0.725. Although slight codebase changes still limit caching benefits, further improvements could help. The study also suggests that LCLMs can handle long interaction histories, reducing the need for complex memory and retrieval mechanisms. Notably, unscaffolded LCLM models can perform competitively on SWE-bench tasks. 


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 90k+ ML SubReddit.

    The post SWE-Bench Performance Reaches 50.8% Without Tool Use: A Case for Monolithic State-in-Context Agents appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleHow to Build a Powerful and Intelligent Question-Answering System by Using Tavily Search API, Chroma, Google Gemini LLMs, and the LangChain Framework
    Next Article Free LinkedIn Text Formatter

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    September 3, 2025
    Machine Learning

    Announcing the new cluster creation experience for Amazon SageMaker HyperPod

    September 3, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Wormable AirPlay Flaws Enable Zero-Click RCE on Apple Devices via Public Wi-Fi

    Development

    Image editing in Gemini just got a major upgrade

    Artificial Intelligence

    💬 AI-Powered Chatbots: Delivering 24/7 Customer Support with Speed & Precision

    Web Development

    How I Built Local-First Apps with React Native + RxDB (and Why Your App Probably Needs This Too)

    Development

    Highlights

    The Secret Playbook: Leadership Lessons From Indian-Origin CEOs

    April 21, 2025

    Indian-origin CEOs have become iconic figures in the global corporate world, steering some of the largest companies to unprecedented success. From Satya Nadella at Microsoft to Indra Nooyi at PepsiCo, their journeys are rich with insights that blend cultural heritage with innovative leadership.
    Indian CEO Success Stories: What Sets Them Apart?

    Focus on Education and Lifelong Learning:

    Many Indian CEOs emphasize the importance of a solid educational foundation. For instance, Sundar Pichai credits his success to his rigorous engineering background and later business education at Stanford and Wharton.

    A notable example is the “Google for India” initiative, driven by Pichai’s understanding of local challenges, showcasing how education and cultural awareness intertwine.

    “Education is the most powerful weapon you can use to change the world,” resonates deeply within their ethos.

    Cultural Adaptability:

    Growing up in diverse environments, Indian-origin leaders develop the ability to adapt to new cultures and challenges. This adaptability has been instrumental in building global teams.

    For example, Indra Nooyi’s ability to navigate cultural differences was key in expanding PepsiCo’s presence globally.

    A “Growth Mindset”:

    Satya Nadella often speaks about the power of a growth mindset, stating, “Success can cause people to unlearn the habits that made them successful in the first place.”

    When Nadella took over as CEO, he revamped Microsoft’s culture to embrace cloud computing, transforming the company into one of the leaders in the tech industry.

    Empathy-Driven Leadership:

    Indra Nooyi’s leadership at PepsiCo was marked by her deep empathy for employees and stakeholders. Her motto, “Performance with Purpose,” highlights balancing business goals with societal impact.

    One notable initiative was her decision to introduce healthier snack options, aligning corporate objectives with public health.

    Key Leadership Lessons From Indian-Origin CEOs
    1. Visionary Thinking

    Case Study: Sundar Pichai

    As the CEO of Google and Alphabet, Pichai’s ability to envision the future of AI and sustainability drives innovation. His advice to aspiring leaders: “Take risks and don’t be afraid to fail.”

    Under his leadership, Google launched AI-focused solutions like Google Assistant and TensorFlow, setting industry benchmarks.

    2. Building Inclusive Teams

    Case Study: Arvind Krishna (IBM)

    Krishna’s focus on diversity has been pivotal at IBM. He often says, “Innovation requires diverse perspectives and inclusive leadership.”

    IBM’s groundbreaking AI technologies, like Watson, thrive due to inclusive and diverse team efforts.

    3. Humility and Hard Work

    Case Study: Shantanu Narayen (Adobe)

    Known for his humility, Narayen’s journey from Hyderabad to leading Adobe exemplifies persistence. His lesson: “Stay grounded and focused on solving real-world problems.”

    He spearheaded Adobe’s transition from packaged software to cloud-based solutions, significantly boosting revenue streams.

    4. Customer-Centric Approach

    Case Study: Ajay Banga (Mastercard)

    Banga’s strategy at Mastercard centered on customer satisfaction, leveraging technology to enhance user experiences. He advises: “Never lose sight of the customer’s voice.”

    His initiatives to promote financial inclusion globally have made Mastercard a leader in digital payments.

    How These Lessons Apply to Emerging Leaders

    Foster Resilience: Learn from setbacks and use them as stepping stones. For example, embracing constructive feedback can turn a potential weakness into a strength.

    Prioritize People: Build strong relationships with your team and stakeholders. Leaders like Indra Nooyi have demonstrated that understanding team dynamics enhances productivity.

    Think Globally: Embrace diverse perspectives to drive innovation. Sundar Pichai’s global vision has been instrumental in Google’s success.

    Invest in Growth: Dedicate time to self-improvement and professional development. Whether through formal education or self-taught skills, continuous growth is essential.

    The “Indian CEO Success Stories” Checklist

    Develop a Growth Mindset

    Read extensively and engage in lifelong learning.

    Embrace challenges and adapt to changing environments.

    Example: Nadella’s embrace of cloud technology transformed Microsoft.

    Cultivate Empathy

    Prioritize team well-being and societal impact.

    Actively listen to employees and customers.

    Example: Nooyi’s introduction of healthier snack lines at PepsiCo.

    Be Visionary

    Identify emerging trends and prepare for future challenges.

    Create a long-term strategy that aligns with core values.

    Example: Pichai’s AI-driven initiatives at Google.

    Stay Grounded

    Focus on solving tangible problems.

    Practice humility regardless of success.

    Example: Narayen’s successful cloud transition strategy at Adobe.

    Conclusion
    The success stories of Indian-origin CEOs are more than inspirational narratives; they are playbooks for leadership in a globalized world. By adopting their principles of resilience, empathy, and visionary thinking, aspiring leaders can carve their paths to success.
    For a comprehensive guide, download our free checklist and embark on your journey to emulate the strategies of these global trailblazers.

    Urgent: CVE-2025–47273 Exposes Python SetupTools — Here’s How to Stay Secure

    June 12, 2025

    Mind Over Matrix: How Srinidhi Ranganathan Solves the Unsolvable

    August 14, 2025

    SonicWALL Connect Tunnel Vulnerability Allows Attackers to Create a DoS Condition

    April 30, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.