Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      From Data To Decisions: UX Strategies For Real-Time Dashboards

      September 13, 2025

      Honeycomb launches AI observability suite for developers

      September 13, 2025

      Low-Code vs No-Code Platforms for Node.js: What CTOs Must Know Before Investing

      September 12, 2025

      ServiceNow unveils Zurich AI platform

      September 12, 2025

      Building personal apps with open source and AI

      September 12, 2025

      What Can We Actually Do With corner-shape?

      September 12, 2025

      Craft, Clarity, and Care: The Story and Work of Mengchu Yao

      September 12, 2025

      Distribution Release: Q4OS 6.1

      September 12, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Optimizely Mission Control – Part III

      September 14, 2025
      Recent

      Optimizely Mission Control – Part III

      September 14, 2025

      Learning from PHP Log to File Example

      September 13, 2025

      Online EMI Calculator using PHP – Calculate Loan EMI, Interest, and Amortization Schedule

      September 13, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      sudo vs sudo-rs: What You Need to Know About the Rust Takeover of Classic Sudo Command

      September 14, 2025
      Recent

      sudo vs sudo-rs: What You Need to Know About the Rust Takeover of Classic Sudo Command

      September 14, 2025

      Dmitry — The Deep Magic

      September 13, 2025

      Right way to record and share our Terminal sessions

      September 13, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Can Coding Agents Improve Themselves? Researchers from University of Bristol and iGent AI Propose SICA (Self-Improving Coding Agent) that Iteratively Enhances Its Own Code and Performance

    Can Coding Agents Improve Themselves? Researchers from University of Bristol and iGent AI Propose SICA (Self-Improving Coding Agent) that Iteratively Enhances Its Own Code and Performance

    April 30, 2025

    The development of agentic systems—LLMs embedded within scaffolds capable of tool use and autonomous decision-making—has made significant progress. Yet, most implementations today rely on fixed, hand-crafted orchestration strategies. These designs are inherently constrained, limiting the agent’s adaptability to new tasks and environments. As models grow in capability, the rigidity of their execution frameworks becomes a bottleneck, especially in domains such as software engineering where the task complexity and variability demand a more flexible system.

    In response, researchers from the University of Bristol and iGent AI have introduced SICA (Self-Improving Coding Agent)—a novel agent architecture designed to iteratively enhance its own performance by modifying its underlying code. Unlike prior methods, such as ADAS, which split responsibilities between a meta-agent and a target-agent, SICA unifies these roles. The same agent that performs the task is also responsible for evaluating past performance, identifying shortcomings, and updating its own implementation. This integration allows for a continuous loop of self-directed improvement without external intervention.

    Architecture and Mechanism of Self-Improvement

    SICA is built upon a minimal, extensible base agent equipped with tools to manipulate its codebase, navigate directories, execute shell commands, and invoke sub-agents. Its architecture follows a loop: evaluate, select, revise. At each iteration, the agent benchmarks its own performance on predefined tasks, stores results, and selects the most effective prior version to serve as the basis for further improvement.

    The agent evaluates performance using a utility function that combines accuracy, time, and cost metrics. Key components include:

    • Sub-agent structure for decomposing problems and managing context within LLM constraints.
    • Asynchronous oversight, a monitoring LLM thread that ensures the agent remains on-task and halts execution in cases of non-progress or divergence.
    • Self-editing capabilities, with tools such as SmartEditor, AST-based symbol locators, and diff summarizers that enable precise modifications to the agent’s behavior.

    This structure allows the agent to conduct controlled experiments on its own design and deploy updates that demonstrably improve outcomes.

    Empirical Evaluation

    The researchers evaluated SICA on several code-related benchmarks, including a subset of SWE Bench Verified, LiveCodeBench, and synthetic tasks focused on file editing and symbol location. Results indicate measurable gains across iterations. For instance, accuracy on SWE Bench Verified increased from 17% to 53%, and file editing performance improved from 82% to 94%.

    These improvements were not limited to benchmark scores. The agent also optimized execution latency and resource efficiency, reducing average cost and time per task. Notably, improvements were not the result of weight updates to the underlying LLM but were achieved through changes in tool orchestration, file management strategies, and problem decomposition heuristics.

    However, gains were less pronounced on reasoning-dominant tasks such as AIME and GPQA. In these cases, the performance of the base LLM (e.g., o3-mini) already approached the task ceiling, limiting the marginal benefit of additional scaffolding. Moreover, introducing certain tool-based reasoning steps appeared to disrupt rather than enhance the performance of pretrained reasoning models, suggesting a need for more integrated co-training between agent logic and model behavior.

    Conclusion

    The SICA framework illustrates a concrete path toward autonomous improvement in agent systems. By consolidating execution and self-editing within a single agent, the system avoids many pitfalls of manual design and enables iterative refinement driven by empirical feedback. The results show that this approach is viable, particularly in domains with long-horizon, tool-mediated tasks such as software engineering.

    While there are clear boundaries to the effectiveness of scaffold-only improvements—especially for tasks dominated by pure reasoning—the research establishes a foundation for future work in hybrid optimization, where both the model and the agent design evolve jointly. SICA also introduces practical considerations for safety and observability in self-improving systems, using LLM-based overseers and structured execution traces to ensure transparency and control.


    Check out the Paper and GitHub Page. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit.

    🔥 [Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

    The post Can Coding Agents Improve Themselves? Researchers from University of Bristol and iGent AI Propose SICA (Self-Improving Coding Agent) that Iteratively Enhances Its Own Code and Performance appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleTutorial on Seamlessly Accessing Any LinkedIn Profile with exa-mcp-server and Claude Desktop Using the Model Context Protocol MCP
    Next Article This AI Tool Is Giving Away $16,000 to Non-Coders (Last Chance to Enter)

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    September 3, 2025
    Machine Learning

    Announcing the new cluster creation experience for Amazon SageMaker HyperPod

    September 3, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    shotman is a screenshot GUI for Wayland

    Linux

    Top 9 AI Clothes Remover Tools of 2025: Free & Effective

    Operating Systems

    Best PHP Project for Final Year Students: Learn, Build, and get Successful with PHPGurukul

    Development

    Using CSS Cascade Layers With Tailwind Utilities

    News & Updates

    Highlights

    CVE-2025-47115 – Adobe Experience Manager Stored Cross-Site Scripting (XSS)

    June 10, 2025

    CVE ID : CVE-2025-47115

    Published : June 10, 2025, 11:15 p.m. | 2 hours, 34 minutes ago

    Description : Adobe Experience Manager versions 6.5.22 and earlier are affected by a stored Cross-Site Scripting (XSS) vulnerability that could be abused by a low privileged attacker to inject malicious scripts into vulnerable form fields. Malicious JavaScript may be executed in a victim’s browser when they browse to the page containing the vulnerable field.

    Severity: 5.4 | MEDIUM

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    Ready to ditch Windows? ‘End of 10’ makes converting your PC to Linux easier than ever

    June 20, 2025

    CVE-2025-3128 – Mitsubishi Electric smartRTU Remote Command Execution

    August 21, 2025

    I replaced my Pixel 9 Pro with a $750 Android for a week. Now I’m questioning my loyalty

    June 21, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.