Related papers: Feedback-Generation for Programming Exercises With GPT-4

Feedback-Generation for Programming Exercises With GPT-4

URL: http://arxiv.org/abs/2403.04449v2
Date: Thu, 4 Jul 2024 07:30:22 GMT
Title: Feedback-Generation for Programming Exercises With GPT-4
Authors: Imen Azaiz, Natalie Kiesler, Sven Strickroth,
Abstract summary: This paper explores the quality of GPT-4 Turbo's generated output for prompts containing both the programming task specification and a student's submission as input. The output was qualitatively analyzed regarding correctness, personalization, fault localization, and other features identified in the material.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Ever since Large Language Models (LLMs) and related applications have become broadly available, several studies investigated their potential for assisting educators and supporting students in higher education. LLMs such as Codex, GPT-3.5, and GPT 4 have shown promising results in the context of large programming courses, where students can benefit from feedback and hints if provided timely and at scale. This paper explores the quality of GPT-4 Turbo's generated output for prompts containing both the programming task specification and a student's submission as input. Two assignments from an introductory programming course were selected, and GPT-4 was asked to generate feedback for 55 randomly chosen, authentic student programming submissions. The output was qualitatively analyzed regarding correctness, personalization, fault localization, and other features identified in the material. Compared to prior work and analyses of GPT-3.5, GPT-4 Turbo shows notable improvements. For example, the output is more structured and consistent. GPT-4 Turbo can also accurately identify invalid casing in student programs' output. In some cases, the feedback also includes the output of the student program. At the same time, inconsistent feedback was noted such as stating that the submission is correct but an error needs to be fixed. The present work increases our understanding of LLMs' potential, limitations, and how to integrate them into e-assessment systems, pedagogical scenarios, and instructing students who are using applications based on GPT-4.

Related papers

Generating Planning Feedback for Open-Ended Programming Exercises with LLMs [1.2499537119440245]
Large language models (LLM) may be able to generate feedback by detecting the overall code structure even for submissions with syntax errors. We show that both the full GPT-4o model and a small variant (GPT-4o-mini) can detect these plans with remarkable accuracy. LLM may be useful in providing feedback for problems in other domains where students start with a set of high-level solution steps.
arXiv Detail & Related papers (2025-04-11T20:26:49Z)
Assessing Large Language Models for Automated Feedback Generation in Learning Programming Problem Solving [0.0]
Large Language Models (LLMs) have emerged as potential tools to automate feedback generation. This study evaluates the performance of four LLMs on a benchmark dataset of 45 student solutions.
arXiv Detail & Related papers (2025-03-18T18:31:36Z)
Evaluating GPT-4 at Grading Handwritten Solutions in Math Exams [48.99818550820575]
We leverage state-of-the-art multi-modal AI models, in particular GPT-4o, to automatically grade handwritten responses to college-level math exams. Using real student responses to questions in a probability theory exam, we evaluate GPT-4o's alignment with ground-truth scores from human graders using various prompting techniques.
arXiv Detail & Related papers (2024-11-07T22:51:47Z)
Leveraging Lecture Content for Improved Feedback: Explorations with GPT-4 and Retrieval Augmented Generation [0.0]
This paper presents the use of Retrieval Augmented Generation to improve the feedback generated by Large Language Models for programming tasks. corresponding lecture recordings were transcribed and made available to the Large Language Model GPT-4 as external knowledge source. The purpose of this is to prevent hallucinations and to enforce the use of the technical terms and phrases from the lecture.
arXiv Detail & Related papers (2024-05-05T18:32:06Z)
Evaluating the Application of Large Language Models to Generate Feedback in Programming Education [0.0]
This study investigates the application of large language models, specifically GPT-4, to enhance programming education. The research outlines the design of a web application that uses GPT-4 to provide feedback on programming tasks, without giving away the solution.
arXiv Detail & Related papers (2024-03-13T23:14:35Z)
Improving the Validity of Automatically Generated Feedback via Reinforcement Learning [50.067342343957876]
We propose a framework for feedback generation that optimize both correctness and alignment using reinforcement learning (RL) Specifically, we use GPT-4's annotations to create preferences over feedback pairs in an augmented dataset for training via direct preference optimization (DPO)
arXiv Detail & Related papers (2024-03-02T20:25:50Z)
Real Customization or Just Marketing: Are Customized Versions of Chat GPT Useful? [0.0]
OpenAI has launched the possibility to fine-tune their model with a natural language web interface. This research is to assess the potential of the customized GPTs that have recently been launched by OpenAI.
arXiv Detail & Related papers (2023-11-27T15:46:15Z)
GPT-4 as an interface between researchers and computational software: improving usability and reproducibility [44.99833362998488]
We focus on a widely used software for molecular dynamics simulations. We quantify the usefulness of input files generated by GPT-4 from task descriptions in English. We find that GPT-4 can generate correct and ready-to-use input files for relatively simple tasks. In addition, GPT-4's description of computational tasks from input files can be tuned from a detailed set of step-by-step instructions to a summary description appropriate for publications.
arXiv Detail & Related papers (2023-10-04T14:25:39Z)
Large Language Models (GPT) for automating feedback on programming assignments [0.0]
We employ OpenAI's GPT-3.5 model to generate personalized hints for students solving programming assignments. Students rated the usefulness of GPT-generated hints positively.
arXiv Detail & Related papers (2023-06-30T21:57:40Z)
Thrilled by Your Progress! Large Language Models (GPT-4) No Longer Struggle to Pass Assessments in Higher Education Programming Courses [0.0]
GPT models evolved from completely failing the typical programming class' assessments to confidently passing the courses with no human involvement. This study provides evidence that programming instructors need to prepare for a world in which there is an easy-to-use technology that can be utilized by learners to collect passing scores.
arXiv Detail & Related papers (2023-06-15T22:12:34Z)
Generalized Planning in PDDL Domains with Pretrained Large Language Models [82.24479434984426]
We consider PDDL domains and use GPT-4 to synthesize Python programs. We evaluate this approach in seven PDDL domains and compare it to four ablations and four baselines.
arXiv Detail & Related papers (2023-05-18T14:48:20Z)
Instruction Tuning with GPT-4 [107.55078894215798]
We present the first attempt to use GPT-4 to generate instruction-following data for finetuning large language models. Our early experiments on instruction-tuned LLaMA models show that the 52K English and Chinese instruction-following data generated by GPT-4 leads to superior zero-shot performance on new tasks.
arXiv Detail & Related papers (2023-04-06T17:58:09Z)
GPT-4 Technical Report [116.90398195245983]
GPT-4 is a large-scale, multimodal model which can accept image and text inputs and produce text outputs. It exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers.
arXiv Detail & Related papers (2023-03-15T17:15:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.