Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»CodeEditorBench: A Machine Learning System for Evaluating the Effectiveness of Large Language Models (LLMs) in Code Editing Activities

    CodeEditorBench: A Machine Learning System for Evaluating the Effectiveness of Large Language Models (LLMs) in Code Editing Activities

    April 9, 2024

    Coding-related jobs have led to the rapid advancement of Large Language Models (LLMs), with a focus on code editing. LLMs created specifically for coding jobs are applied to a variety of activities, including code optimisation and repair. As programming tools, they are becoming more and more popular, but most evaluation techniques concentrate on code production, ignoring the crucial role that code editing plays in software development.

    In recent research, a team of researchers from the Multimodal Art Projection Research Community, University of Waterloo, HKUST, University of Manchester, Tongji University, and Vector Institute has introduced CodeEditorBench, an assessment system that has been designed to evaluate LLMs’ effectiveness in a range of code editing activities, such as requirement switching, debugging, translating, and polishing. 

    In contrast to other benchmarks that primarily concentrate on code creation, CodeEditorBench emphasises real-world applications and pragmatic elements of software development. The team has selected a variety of coding scenarios and challenges from five distinct sources, covering a broad spectrum of programming languages, degrees of difficulty, and editing assignments. By doing this, they have made sure that the evaluation takes into account the variety and complexity of difficulties found in actual coding environments.

    The team has found some intriguing trends in their review, which included 19 distinct LLMs. In the CodeEditorBench framework, closed-source models, specifically, Gemini-Ultra and GPT-4 have demonstrated better performance than open-source models. This emphasises how important model architecture and training data are to deciding performance, particularly when varying prompt sensitivity and problem categories. 

    The team has summarized their primary contributions as follows.

    The goal of CodeEditorBench is to offer a uniform approach for evaluating LLMs. Tools for additional analyses, training, and visualisation have been included in this framework. To promote more research into LLM features, the team has shared that all evaluation-related data will be openly accessible. To improve the assessment’s comprehensiveness, more evaluation measures will be added in the future. 

    The main aim is to map the current state of LLMs. OpenCIDS-33B is the most effective base model available to the public, followed by OpenCI-DS-6.7B and DS-33B-INST. Models like Gemini, GPT, and GLM that are not publicly accessible usually perform better than those that are. OpenCIDS-33B and DS-33B-INST, two instruction-tuned models with over 30 billion parameters, close this performance difference. 

    The goal of CodeEditorBench is to draw attention to the shortcomings of LLMs, especially when it comes to rewriting and revising code. Though it performs admirably in three of the four categories, GPT4’s code-polishing abilities are noticeably lacking. In a similar vein, Gemini Ultra is not up to the challenge of changing code requirements. The team has recognized these constraints to tackle these particular issues in LLM training and development.

    In conclusion, CodeEditorBench’s main objective is to spur advances in LLMs by providing a strong platform for thoroughly assessing code editing capabilities.

    Check out the Paper, Project, and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 40k+ ML SubReddit

    [1/n]
    Excited to share our latest work: “CodeEditorBench:Evaluating Code Editing Capability of Large Language Models”! https://t.co/GckeztzIbT

    ### Highlights of the CodeEditorBench:
    > 8K meticulously collected code editing questions from five sources: namely… pic.twitter.com/BUaN6v99BM

    — Ge Zhang (@GeZhang86038849) April 5, 2024

    The post CodeEditorBench: A Machine Learning System for Evaluating the Effectiveness of Large Language Models (LLMs) in Code Editing Activities appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleTop Product Management Books to Read in 2024
    Next Article Enabling Commerce Innovation with the Power of MongoDB and Google Cloud

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 17, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2024-47893 – VMware GPU Firmware Memory Disclosure

    May 17, 2025
    Leave A Reply Cancel Reply

    Hostinger

    Continue Reading

    The Role and Impact of the Chief AI Officer (CAIO) in Modern Business

    Development

    Retrieval-augmented Generation: Revolution or Overpromise?

    Development

    Website Redesign To Increase User Engagement

    Development

    Are EEG-to-Text Models Really Learning or Just Memorizing? A Deep Dive into Model Reliability

    Development

    Highlights

    Machine Learning

    ChemAgent: Enhancing Large Language Models for Complex Chemical Reasoning with Dynamic Memory Frameworks

    January 17, 2025

    Chemical reasoning involves intricate, multi-step processes requiring precise calculations, where small errors can lead to…

    SolarWinds Patches 8 Critical Flaws in Access Rights Manager Software

    July 26, 2024

    A new frontier in HPC with “Bring Your Own Code”

    May 6, 2025

    Why you should ignore 99% of AI tools – and which four I use every day

    March 16, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.