Related papers: A Survey on Feedback Types in Automated Programming Assessment Systems

A Survey on Feedback Types in Automated Programming Assessment Systems

URL: http://arxiv.org/abs/2510.18923v1
Date: Tue, 21 Oct 2025 09:08:22 GMT
Title: A Survey on Feedback Types in Automated Programming Assessment Systems
Authors: Eduard Frankford, Tobias Antensteiner, Michael Vierhauser, Clemens Sauerwein, Vivien Wallner, Iris Groher, Reinhold Plösch, Ruth Breu,
Abstract summary: This study investigates how different feedback mechanisms in APASs are perceived by students, and how effective they are in supporting problem-solving.<n>Results indicate that while students rate unit test feedback as the most helpful, AI-generated feedback leads to significantly better performances.
Score: 3.9845307287664973
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: With the recent rapid increase in digitization across all major industries, acquiring programming skills has increased the demand for introductory programming courses. This has further resulted in universities integrating programming courses into a wide range of curricula, including not only technical studies but also business and management fields of study. Consequently, additional resources are needed for teaching, grading, and tutoring students with diverse educational backgrounds and skills. As part of this, Automated Programming Assessment Systems (APASs) have emerged, providing scalable and high-quality assessment systems with efficient evaluation and instant feedback. Commonly, APASs heavily rely on predefined unit tests for generating feedback, often limiting the scope and level of detail of feedback that can be provided to students. With the rise of Large Language Models (LLMs) in recent years, new opportunities have emerged as these technologies can enhance feedback quality and personalization. To investigate how different feedback mechanisms in APASs are perceived by students, and how effective they are in supporting problem-solving, we have conducted a large-scale study with over 200 students from two different universities. Specifically, we compare baseline Compiler Feedback, standard Unit Test Feedback, and advanced LLM-based Feedback regarding perceived quality and impact on student performance. Results indicate that while students rate unit test feedback as the most helpful, AI-generated feedback leads to significantly better performances. These findings suggest combining unit tests and AI-driven guidance to optimize automated feedback mechanisms and improve learning outcomes in programming education.

Related papers

Let the Barbarians In: How AI Can Accelerate Systems Performance Research [80.43506848683633]
We term this iterative cycle of generation, evaluation, and refinement AI-Driven Research for Systems.<n>We demonstrate that ADRS-generated solutions can match or even outperform human state-of-the-art designs.
arXiv Detail & Related papers (2025-12-16T18:51:23Z)
A Comparative Study of Technical Writing Feedback Quality: Evaluating LLMs, SLMs, and Humans in Computer Science Topics [3.2351366072725596]
This study investigates the quality of feedback generated by Large Language Models (LLMs), Small Language Models (SLMs), and artificial intelligence (AI) tools.<n>We analyze the student perspective on feedback quality, evaluated based on multiple criteria, including readability, detail, specificity, actionability, helpfulness, and overall quality.<n>Our findings underscore the potential of hybrid approaches that combine AI and human feedback to achieve efficient and high-quality feedback at scale.
arXiv Detail & Related papers (2025-12-01T22:51:54Z)
The AI Imperative: Scaling High-Quality Peer Review in Machine Learning [49.87236114682497]
We argue that AI-assisted peer review must become an urgent research and infrastructure priority.<n>We propose specific roles for AI in enhancing factual verification, guiding reviewer performance, assisting authors in quality improvement, and supporting ACs in decision-making.
arXiv Detail & Related papers (2025-06-09T18:37:14Z)
Enhancing tutoring systems by leveraging tailored promptings and domain knowledge with Large Language Models [2.5362697136900563]
AI-driven tools like ChatGPT and Intelligent Tutoring Systems (ITS) have enhanced learning experiences through personalisation and flexibility.<n>ITSs can adapt to individual learning needs and provide customised feedback based on a student's performance, cognitive state, and learning path.<n>Our research aims to address these gaps by integrating skill-aligned feedback via Retrieval Augmented Generation (RAG) into prompt engineering for Large Language Models (LLMs)
arXiv Detail & Related papers (2025-05-02T02:30:39Z)
A Zero-Shot LLM Framework for Automatic Assignment Grading in Higher Education [0.6141800972050401]
We propose a Zero-Shot Large Language Model (LLM)-Based Automated Assignment Grading (AAG) system.<n>This framework leverages prompt engineering to evaluate both computational and explanatory student responses without requiring additional training or fine-tuning.<n>The AAG system delivers tailored feedback that highlights individual strengths and areas for improvement, thereby enhancing student learning outcomes.
arXiv Detail & Related papers (2025-01-24T08:01:41Z)
Personalised Feedback Framework for Online Education Programmes Using Generative AI [0.0]
This paper presents an alternative feedback framework which extends the capabilities of ChatGPT by integrating embeddings. As part of the study, we proposed and developed a proof of concept solution, achieving an efficacy rate of 90% and 100% for open-ended and multiple-choice questions.
arXiv Detail & Related papers (2024-10-14T22:35:40Z)
Could ChatGPT get an Engineering Degree? Evaluating Higher Education Vulnerability to AI Assistants [176.39275404745098]
We evaluate whether two AI assistants, GPT-3.5 and GPT-4, can adequately answer assessment questions.<n>GPT-4 answers an average of 65.8% of questions correctly, and can even produce the correct answer across at least one prompting strategy for 85.1% of questions.<n>Our results call for revising program-level assessment design in higher education in light of advances in generative AI.
arXiv Detail & Related papers (2024-08-07T12:11:49Z)
Improving the Validity of Automatically Generated Feedback via Reinforcement Learning [46.667783153759636]
We propose a framework for feedback generation that optimize both correctness and alignment using reinforcement learning (RL)<n>Specifically, we use GPT-4's annotations to create preferences over feedback pairs in an augmented dataset for training via direct preference optimization (DPO)
arXiv Detail & Related papers (2024-03-02T20:25:50Z)
Empowering Private Tutoring by Chaining Large Language Models [87.76985829144834]
This work explores the development of a full-fledged intelligent tutoring system powered by state-of-the-art large language models (LLMs) The system is into three inter-connected core processes-interaction, reflection, and reaction. Each process is implemented by chaining LLM-powered tools along with dynamically updated memory modules.
arXiv Detail & Related papers (2023-09-15T02:42:03Z)
Investigating Fairness Disparities in Peer Review: A Language Model Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs) We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date. We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z)
An Analysis of Programming Course Evaluations Before and After the Introduction of an Autograder [1.329950749508442]
This paper studies the answers to the standardized university evaluation questionnaires of foundational computer science courses which recently introduced autograding. We hypothesize how the autograder might have contributed to the significant changes in the data, such as, improved interactions between tutors and students, improved overall course quality, improved learning success, increased time spent, and reduced difficulty. The autograder technology can be validated as a teaching method to improve student satisfaction with programming courses.
arXiv Detail & Related papers (2021-10-28T14:09:44Z)
ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification. A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors. Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.