Large Language Models (GPT) for automating feedback on programming
assignments
- URL: http://arxiv.org/abs/2307.00150v1
- Date: Fri, 30 Jun 2023 21:57:40 GMT
- Title: Large Language Models (GPT) for automating feedback on programming
assignments
- Authors: Maciej Pankiewicz and Ryan S. Baker
- Abstract summary: We employ OpenAI's GPT-3.5 model to generate personalized hints for students solving programming assignments.
Students rated the usefulness of GPT-generated hints positively.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Addressing the challenge of generating personalized feedback for programming
assignments is demanding due to several factors, like the complexity of code
syntax or different ways to correctly solve a task. In this experimental study,
we automated the process of feedback generation by employing OpenAI's GPT-3.5
model to generate personalized hints for students solving programming
assignments on an automated assessment platform. Students rated the usefulness
of GPT-generated hints positively. The experimental group (with GPT hints
enabled) relied less on the platform's regular feedback but performed better in
terms of percentage of successful submissions across consecutive attempts for
tasks, where GPT hints were enabled. For tasks where the GPT feedback was made
unavailable, the experimental group needed significantly less time to solve
assignments. Furthermore, when GPT hints were unavailable, students in the
experimental condition were initially less likely to solve the assignment
correctly. This suggests potential over-reliance on GPT-generated feedback.
However, students in the experimental condition were able to correct reasonably
rapidly, reaching the same percentage correct after seven submission attempts.
The availability of GPT hints did not significantly impact students' affective
state.
Related papers
- Evaluating GPT-4 at Grading Handwritten Solutions in Math Exams [48.99818550820575]
We leverage state-of-the-art multi-modal AI models, in particular GPT-4o, to automatically grade handwritten responses to college-level math exams.
Using real student responses to questions in a probability theory exam, we evaluate GPT-4o's alignment with ground-truth scores from human graders using various prompting techniques.
arXiv Detail & Related papers (2024-11-07T22:51:47Z) - The application of GPT-4 in grading design university students' assignment and providing feedback: An exploratory study [5.915297169078896]
This study employs an iterative research approach in developing a Custom GPT.
The inter-reliability between GPT and human raters reached a level that is generally accepted by educators.
With adequate instructions, a Custom GPT gives consistent results which is a precondition for grading students.
arXiv Detail & Related papers (2024-09-26T10:09:10Z) - Navigating Compiler Errors with AI Assistance - A Study of GPT Hints in an Introductory Programming Course [0.23020018305241333]
We examined the efficacy of AI-assisted learning in an introductory programming course at the university level.
We used a GPT-4 model to generate personalized hints for compiler errors within a platform for automated assessment of programming assignments.
For the six most commonly occurring error types we observed mixed results in terms of performance when access to GPT hints was enabled for the experimental group.
arXiv Detail & Related papers (2024-03-19T13:54:14Z) - Feedback-Generation for Programming Exercises With GPT-4 [0.0]
This paper explores the quality of GPT-4 Turbo's generated output for prompts containing both the programming task specification and a student's submission as input.
The output was qualitatively analyzed regarding correctness, personalization, fault localization, and other features identified in the material.
arXiv Detail & Related papers (2024-03-07T12:37:52Z) - Improving the Validity of Automatically Generated Feedback via
Reinforcement Learning [50.067342343957876]
We propose a framework for feedback generation that optimize both correctness and alignment using reinforcement learning (RL)
Specifically, we use GPT-4's annotations to create preferences over feedback pairs in an augmented dataset for training via direct preference optimization (DPO)
arXiv Detail & Related papers (2024-03-02T20:25:50Z) - Students' Perceptions and Preferences of Generative Artificial
Intelligence Feedback for Programming [15.372316943507506]
We generated automated feedback using the ChatGPT API for four lab assignments in an introductory computer science class.
Students perceived the feedback as aligning well with formative feedback guidelines established by Shute.
Students generally expected specific and corrective feedback with sufficient code examples, but had diverged opinions on the tone of the feedback.
arXiv Detail & Related papers (2023-12-17T22:26:53Z) - AI-enhanced Auto-correction of Programming Exercises: How Effective is
GPT-3.5? [0.0]
This paper investigates the potential of AI in providing personalized code correction and generating feedback.
GPT-3.5 exhibited weaknesses in its evaluation, including localization of errors that were not the actual errors, or even hallucinated errors.
arXiv Detail & Related papers (2023-10-24T10:35:36Z) - Collaborative Generative AI: Integrating GPT-k for Efficient Editing in
Text-to-Image Generation [114.80518907146792]
We investigate the potential of utilizing large-scale language models, such as GPT-k, to improve the prompt editing process for text-to-image generation.
We compare the common edits made by humans and GPT-k, evaluate the performance of GPT-k in prompting T2I, and examine factors that may influence this process.
arXiv Detail & Related papers (2023-05-18T21:53:58Z) - News Summarization and Evaluation in the Era of GPT-3 [73.48220043216087]
We study how GPT-3 compares against fine-tuned models trained on large summarization datasets.
We show that not only do humans overwhelmingly prefer GPT-3 summaries, prompted using only a task description, but these also do not suffer from common dataset-specific issues such as poor factuality.
arXiv Detail & Related papers (2022-09-26T01:04:52Z) - Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language
Models [107.05966685291067]
We propose test-time prompt tuning (TPT) to learn adaptive prompts on the fly with a single test sample.
TPT improves the zero-shot top-1 accuracy of CLIP by 3.6% on average.
In evaluating cross-dataset generalization with unseen categories, TPT performs on par with the state-of-the-art approaches that use additional training data.
arXiv Detail & Related papers (2022-09-15T17:55:11Z) - ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification.
A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors.
Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.