Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Unveiling the Potential of Large Language Models: Enhancing Feedback Generation in Computing Education

    Unveiling the Potential of Large Language Models: Enhancing Feedback Generation in Computing Education

    May 17, 2024

    Feedback is crucial for student success, especially in large computing classes facing increasing demand. Automated tools, incorporating analysis techniques and testing frameworks, are gaining popularity but often need more helpful suggestions. Recent advancements in large language models (LLMs) show promise in offering rapid, human-like feedback. However, concerns about the accuracy, reliability, and ethical implications of using proprietary LLMs persist, necessitating exploring open-source alternatives in computing education.

    Automated feedback generation in computing education has been a persistent challenge, focusing mainly on identifying mistakes rather than offering constructive guidance. LLMs present a promising solution to this issue. Recent research has explored using LLMs for automated feedback generation but highlights limitations in their performance. While some studies show LLMs like GPT-3 and GPT-3.5 can identify issues in student code, they also exhibit inconsistencies and inaccuracies in feedback. Also, current state-of-the-art models struggle to match human performance when providing programming exercise feedback. The concept of using LLMs as judges to evaluate other LLMs’ output, termed LLMs-as-judges, has gained traction. This approach has shown promising results, with models like GPT-4 reaching high levels of agreement with human judgments.

    Researchers from Aalto University, the University of Jyväskylä, and The University of Auckland provide a thorough study to assess the effectiveness of LLMs in providing feedback on student-written programs and to explore whether open-source LLMs can rival proprietary ones in this regard. The focus lies on feedback that detects errors in student code, such as compiler errors or test failures. Initially, evaluations compare programming feedback from GPT-4 with expert human ratings, establishing a baseline for assessing LLM-generated feedback quality. Subsequently, the study evaluates feedback quality from various open-source LLMs compared to proprietary models. To address these research questions, existing datasets and new feedback generated by open-source models are assessed using GPT-4 as a judge.

    Data from an introductory programming course by Aalto University was utilized, consisting of student help requests and feedback generated by GPT-3.5. Evaluation criteria focused on feedback completeness, perceptivity, and selectivity. Feedback was assessed both qualitatively and automatically using GPT-4. Open-source LLMs were evaluated alongside proprietary ones, employing a rubric-based grading system. GPT-4 judged the quality of feedback generated by LLMs based on human annotations. Precision and F0.5-score were key metrics used to evaluate the judge’s performance.

    The results show that while most feedback is perceptive, only a little over half is complete, and many contain misleading content. GPT-4 tends to grade feedback more positively compared to human annotators, indicating some positive bias. Classification performance results for GPT-4 show reasonably good performance in completeness classification and slightly lower performance in selectivity. Perceptivity classification scores higher, partially due to data skew. Kappa scores indicate moderate agreement, with GPT-4 maintaining high recall across all criteria while maintaining reasonable precision and accuracy.

    To recapitulate, this study examined the effectiveness of GPT-4 in evaluating automatically generated programming feedback and assessed the performance of various large language models, including open-source ones, in generating feedback on student code. Results indicate that GPT-4 shows promise in reliably assessing the quality of automatically generated feedback. Also, open-source language models demonstrate the potential to generate programming feedback. This suggests that LLM-generated feedback could serve as a cost-effective and accessible resource in learning environments, allowing instructors and teaching assistants to focus on more challenging cases where LLMs may currently fall short in assisting students.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 42k+ ML SubReddit

    The post Unveiling the Potential of Large Language Models: Enhancing Feedback Generation in Computing Education appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleTIGER-Lab Introduces MMLU-Pro Dataset for Comprehensive Benchmarking of Large Language Models’ Capabilities and Performance
    Next Article Understanding the Key Layers of UI Interaction

    Related Posts

    Machine Learning

    LLMs Struggle with Real Conversations: Microsoft and Salesforce Researchers Reveal a 39% Performance Drop in Multi-Turn Underspecified Tasks

    May 17, 2025
    Machine Learning

    This AI paper from DeepSeek-AI Explores How DeepSeek-V3 Delivers High-Performance Language Modeling by Minimizing Hardware Overhead and Maximizing Computational Efficiency

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    NATURAL PLAN: Benchmarking LLMs on natural language planning

    Artificial Intelligence

    CVE-2024-6198 – “TP-Link Modem Lighttpd SNORE Stack Buffer Overflow Vulnerability”

    Common Vulnerabilities and Exposures (CVEs)

    Hackers Exploit MS Equation Editor Vulnerability to Deploy XLoader Malware

    Security

    Bruno API Automation: A Comprehensive Guide

    Development

    Highlights

    Development

    Why Next-Gen Data Intelligence Platforms are a Game Changer for Businesses?

    June 1, 2024

    By Siddharth Deshmukh, Chief Operating Officer, Clover Infotech In today’s competitive business landscape, making informed…

    CVE-2025-20188 – Cisco IOS XE Software Wireless LAN Controllers Unauthenticated Remote File Upload and Command Execution Vulnerability

    May 7, 2025

    This subscription-free smart ring with remarkable battery life isn’t from Oura or Samsung

    February 11, 2025

    How to Fix ERROR_TOO_MANY_DESCRIPTORS in Windows

    December 20, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.