Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      CodeSOD: Identify a Nap

      September 23, 2025

      Ambient Animations In Web Design: Principles And Implementation (Part 1)

      September 23, 2025

      Benchmarking AI-assisted developers (and their tools) for superior AI governance

      September 23, 2025

      Digital.ai launches White-box Cryptography Agent to enable stronger application security

      September 23, 2025

      Development Release: MX Linux 25 Beta 1

      September 22, 2025

      DistroWatch Weekly, Issue 1140

      September 21, 2025

      Distribution Release: DietPi 9.17

      September 21, 2025

      Development Release: Zorin OS 18 Beta

      September 19, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Stop using .reverse().find(): meet findLast()

      September 23, 2025
      Recent

      Stop using .reverse().find(): meet findLast()

      September 23, 2025

      @ts-ignore is almost always the worst option

      September 22, 2025

      MutativeJS v1.3.0 is out with massive performance gains

      September 22, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      How I Configure Polybar to Customize My Linux Desktop

      September 23, 2025
      Recent

      How I Configure Polybar to Customize My Linux Desktop

      September 23, 2025

      Development Release: MX Linux 25 Beta 1

      September 22, 2025

      DistroWatch Weekly, Issue 1140

      September 21, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Tech & Work»Benchmarking AI-assisted developers (and their tools) for superior AI governance

    Benchmarking AI-assisted developers (and their tools) for superior AI governance

    September 23, 2025

    A quick browse of LinkedIn, DevTok, and X would lead you to believe that almost every developer has jumped on board the vibe coding hype train with full gusto. And while it’s not that far-fetched, with 84% of developers confirming they are currently using (or planning to use) AI coding tools in their daily workflows, a full surrender to vibe coding autonomous agents is still unusual. Stack Overflow’s 2025 AI Survey revealed that most respondents (72%) are not (yet) vibe coding. Still, adoption is trending upwards, and AI is currently generating 41% of all code, for better or worse.

    Tools like Cursor and Windsurf represent the latest generation of AI coding assistants, each with a powerful autonomous mode that can make decisions independently based on preset parameters. The speed and productivity gains are undeniable, but a worrying trend is emerging: many of these tools are being deployed in enterprise environments, and these teams are not equipped to address the inherent security issues associated with their use. Human governance is paramount, and too few security leaders are making an effort to modernize their security programs to adequately shield themselves from the risk of AI-generated code.

    If the tech stack lacks tools that oversee not only developer security proficiency, but also the trustworthiness of approved AI coding companions each developer uses, then it is likely that efforts to uplift the overall security program and the developers working within it will be short of the appropriate data insights to effect change.

    AI and human governance should be a priority

    The drawing card of agentic models is their ability to work autonomously and make decisions independently, and these being embedded into enterprise environments at scale without appropriate human governance is inevitably going to introduce security issues that are not particularly visible or easy to stop.

    Long-standing security problems like sensitive data exposure and insufficient logging and monitoring remain, and emerging threats like memory poisoning and tool poisoning are not issues to take lightly. CISOs must take steps to reduce developer risk, and provide continuous learning and skills verification within their security programs in order to safely implement the help of agentic AI agents.

    Powerful benchmarking lights your developer’s path

    It’s very difficult to make impactful, positive improvements to a security program based solely on anecdotal accounts, limited feedback, and other data points that are more subjective in nature. These types of data, while helpful in correcting more glaring faults (such as a particular tool continuously failing or personnel time being wasted on a low-value and frustrating task), will do little to uplift the program to a new level. Sadly, the “people” part of an enterprise security (or, indeed, Secure by Design) initiative is notoriously tricky to measure, and too often neglected as a piece of the puzzle that must be a priority to solve.

    This is where governance tools that deliver data points on individual developer security proficiency – categorized by language, framework and even industry – can be the difference between executing yet another flat training and observability exercise, as opposed to proper developer risk management, where the tools are working to collect the insights needed to plug knowledge gaps, filter security-proficient devs to the most sensitive projects, and importantly, monitor and approve the tools they use in their day, such as AI coding companions.

    Assessment of agentic AI coding tools and LLMs

    Three years on, we can confidently conclude that not all AI coding tools are created equal. More studies are emerging that assist in differentiating the strengths and weaknesses of each model, for a variety of applications. Sonar’s recent study on the coding personalities of each model was quite eye-opening, revealing the different traits of models like Claude Sonnet 4, OpenCoder-8B, Llama 3.2 90B, GPT-4o, and Claude Sonnet 3.7, with insight into how their individual approaches to coding affect code quality and, subsequently, associated security risk. Semgrep’s deep dive into the capabilities of AI coding agents for detecting vulnerabilities also yielded mixed results, with findings that generally demonstrated that a security-focused prompt can already identify real vulnerabilities in real applications. However, depending on the vulnerability class, a high volume of false positives created noisy, less valuable results.

    Our own unique benchmarking data supports much of Semgrep’s findings. We were able to show that the best LLMs perform comparably with proficient people at a range of limited secure coding tasks. However, there is a significant drop in consistency among LLMs across different stages of tasks, languages, and vulnerability categories. Generally, top developers with security proficiency outperform all LLMs, while average developers do not.

    With studies like this in mind, we must not lose sight of what we as an industry are allowing into our codebases: AI coding agents have increasing autonomy, oversight and general use, and they must be treated like any other human with their hands on the tools. This, in effect, requires careful management in terms of assessing their security proficiency, access level, commits and mistakes with the same fervor as the human operating them, with no exceptions. How trustworthy is the output of the tool, and how security proficient is its operator?

    If security leaders cannot answer these questions and plan accordingly, the attack surface will continue to grow by the day. If you don’t know where the code is coming from, make sure it’s not going in any repository, with no exceptions.

    The post Benchmarking AI-assisted developers (and their tools) for superior AI governance appeared first on SD Times.

    Source: Read More 

    news
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleDigital.ai launches White-box Cryptography Agent to enable stronger application security
    Next Article Ambient Animations In Web Design: Principles And Implementation (Part 1)

    Related Posts

    Tech & Work

    CodeSOD: Identify a Nap

    September 23, 2025
    Tech & Work

    Ambient Animations In Web Design: Principles And Implementation (Part 1)

    September 23, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2025-53181 – Adobe PDF Preview Module Null Pointer Dereference Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-6351 – iSourcecode Employee Record Management System SQL Injection

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-46343 – n8n Stored XSS Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    How to Apply CSS3 Transforms to Background Images

    Development

    Highlights

    CVE-2025-49793 – Apache HTTP Server Cross-Site Request Forgery

    June 11, 2025

    CVE ID : CVE-2025-49793

    Published : June 11, 2025, 3:15 a.m. | 2 hours, 36 minutes ago

    Description : Rejected reason: Not used

    Severity: 0.0 | NA

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    Chrome Update Alert: Two High-Severity Flaws Patched – Update Now to Stay Safe!

    June 10, 2025

    I tested the viral Sigma BF camera, and its radical redesign has me hooked

    August 1, 2025

    CVE-2025-31238 – Apple Safari Web Content Memory Corruption Vulnerability

    May 12, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.