Related papers: Humanizing Automated Programming Feedback: Fine-Tuning Generative Models with Student-Written Feedback

Humanizing Automated Programming Feedback: Fine-Tuning Generative Models with Student-Written Feedback

URL: http://arxiv.org/abs/2509.10647v1
Date: Fri, 12 Sep 2025 19:23:05 GMT
Title: Humanizing Automated Programming Feedback: Fine-Tuning Generative Models with Student-Written Feedback
Authors: Victor-Alexandru Pădurean, Tung Phung, Nachiket Kotalwar, Michael Liut, Juho Leinonen, Paul Denny, Adish Singla,
Abstract summary: We explore learnersourcing as a means to fine-tune language models for generating feedback that is more similar to that written by humans.<n>We collected approximately 1,900 instances of student-written feedback on multiple programming problems and buggy programs.<n>Our findings indicate that fine-tuning models on learnersourced data not only produces feedback that better matches the style of feedback written by students, but also improves accuracy compared to feedback generated through prompt engineering alone.
Score: 21.114005575615586
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The growing need for automated and personalized feedback in programming education has led to recent interest in leveraging generative AI for feedback generation. However, current approaches tend to rely on prompt engineering techniques in which predefined prompts guide the AI to generate feedback. This can result in rigid and constrained responses that fail to accommodate the diverse needs of students and do not reflect the style of human-written feedback from tutors or peers. In this study, we explore learnersourcing as a means to fine-tune language models for generating feedback that is more similar to that written by humans, particularly peer students. Specifically, we asked students to act in the flipped role of a tutor and write feedback on programs containing bugs. We collected approximately 1,900 instances of student-written feedback on multiple programming problems and buggy programs. To establish a baseline for comparison, we analyzed a sample of 300 instances based on correctness, length, and how the bugs are described. Using this data, we fine-tuned open-access generative models, specifically Llama3 and Phi3. Our findings indicate that fine-tuning models on learnersourced data not only produces feedback that better matches the style of feedback written by students, but also improves accuracy compared to feedback generated through prompt engineering alone, even though some student-written feedback is incorrect. This surprising finding highlights the potential of student-centered fine-tuning to improve automated feedback systems in programming education.

Related papers

AI-Mediated Feedback Improves Student Revisions: A Randomized Trial with FeedbackWriter in a Large Undergraduate Course [5.829384802817518]
We introduce and deploy FeedbackWriter, a system that generates AI suggestions to teaching assistants (TAs) while they provide feedback on students' knowledge-intensive essays.<n>Students were randomly assigned to receive either handwritten feedback from TAs or AI-mediated feedback where TAs received suggestions from FeedbackWriter.<n>We found that students receiving AI-mediated feedback produced significantly higher-quality revisions, with gains increasing as TAs adopted more AI suggestions.
arXiv Detail & Related papers (2026-02-18T19:33:56Z)
Can Automated Feedback Turn Students into Happy Prologians? [0.9087641068861047]
Students found all implemented feedback types helpful, with automatic testing ranked as the most useful.<n>We introduce a dataset comprising 7201 correct and incorrect Prolog submissions, along with 200 manually annotated programs labeled with bug types and corresponding corrections.
arXiv Detail & Related papers (2025-04-23T14:11:54Z)
Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation [67.88747330066049]
Fine-grained feedback captures nuanced distinctions in image quality and prompt-alignment. We show that demonstrating its superiority to coarse-grained feedback is not automatic. We identify key challenges in eliciting and utilizing fine-grained feedback.
arXiv Detail & Related papers (2024-06-24T17:19:34Z)
Generating Feedback-Ladders for Logical Errors in Programming using Large Language Models [2.1485350418225244]
Large language model (LLM)-based methods have shown great promise in feedback generation for programming assignments. This paper explores using LLMs to generate a "feedback-ladder", i.e., multiple levels of feedback for the same problem-submission pair. We evaluate the quality of the generated feedback-ladder via a user study with students, educators, and researchers.
arXiv Detail & Related papers (2024-05-01T03:52:39Z)
Improving the Validity of Automatically Generated Feedback via Reinforcement Learning [46.667783153759636]
We propose a framework for feedback generation that optimize both correctness and alignment using reinforcement learning (RL)<n>Specifically, we use GPT-4's annotations to create preferences over feedback pairs in an augmented dataset for training via direct preference optimization (DPO)
arXiv Detail & Related papers (2024-03-02T20:25:50Z)
Constructive Large Language Models Alignment with Diverse Feedback [76.9578950893839]
We introduce Constructive and Diverse Feedback (CDF) as a novel method to enhance large language models alignment. We exploit critique feedback for easy problems, refinement feedback for medium problems, and preference feedback for hard problems. By training our model with this diversified feedback, we achieve enhanced alignment performance while using less training data.
arXiv Detail & Related papers (2023-10-10T09:20:14Z)
UltraFeedback: Boosting Language Models with Scaled AI Feedback [99.4633351133207]
We present textscUltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset. Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models.
arXiv Detail & Related papers (2023-10-02T17:40:01Z)
Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation [68.9440575276396]
This survey aims to provide an overview of the recent research that has leveraged human feedback to improve natural language generation. First, we introduce an encompassing formalization of feedback, and identify and organize existing research into a taxonomy following this formalization. Second, we discuss how feedback can be described by its format and objective, and cover the two approaches proposed to use feedback (either for training or decoding): directly using the feedback or training feedback models. Third, we provide an overview of the nascent field of AI feedback, which exploits large language models to make judgments based on a set of principles and minimize the need for
arXiv Detail & Related papers (2023-05-01T17:36:06Z)
Giving Feedback on Interactive Student Programs with Meta-Exploration [74.5597783609281]
Developing interactive software, such as websites or games, is a particularly engaging way to learn computer science. Standard approaches require instructors to manually grade student-implemented interactive programs. Online platforms that serve millions, like Code.org, are unable to provide any feedback on assignments for implementing interactive programs.
arXiv Detail & Related papers (2022-11-16T10:00:23Z)
Feedback and Engagement on an Introductory Programming Module [0.0]
We ran a study on engagement and achievement for a first year undergraduate programming module which used an online learning environment containing tasks which generate automated feedback. We gathered quantitative data on engagement and achievement which allowed us to split the cohort into 6 groups. We then ran interviews with students after the end of the module to produce qualitative data on perceptions of what feedback is, how useful it is, the uses made of it, and how it bears on engagement.
arXiv Detail & Related papers (2022-01-04T16:53:09Z)
ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification. A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors. Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z)
Effects of Human vs. Automatic Feedback on Students' Understanding of AI Concepts and Programming Style [0.0]
The use of automatic grading tools has become nearly ubiquitous in large undergraduate programming courses. There is a relative lack of data directly comparing student outcomes when receiving computer-generated feedback and human-written feedback. This paper addresses this gap by splitting one 90-student class into two feedback groups and analyzing differences in the two cohorts' performance.
arXiv Detail & Related papers (2020-11-20T21:40:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.