Generating High-Precision Feedback for Programming Syntax Errors using
Large Language Models
- URL: http://arxiv.org/abs/2302.04662v2
- Date: Fri, 28 Apr 2023 11:33:44 GMT
- Title: Generating High-Precision Feedback for Programming Syntax Errors using
Large Language Models
- Authors: Tung Phung, Jos\'e Cambronero, Sumit Gulwani, Tobias Kohn, Rupak
Majumdar, Adish Singla, Gustavo Soares
- Abstract summary: Large language models (LLMs) hold great promise in enhancing programming education by automatically generating feedback for students.
We introduce PyFiXV, our technique to generate high-precision feedback powered by Codex.
- Score: 23.25258654890813
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs), such as Codex, hold great promise in enhancing
programming education by automatically generating feedback for students. We
investigate using LLMs to generate feedback for fixing syntax errors in Python
programs, a key scenario in introductory programming. More concretely, given a
student's buggy program, our goal is to generate feedback comprising a fixed
program along with a natural language explanation describing the errors/fixes,
inspired by how a human tutor would give feedback. While using LLMs is
promising, the critical challenge is to ensure high precision in the generated
feedback, which is imperative before deploying such technology in classrooms.
The main research question we study is: Can we develop LLMs-based feedback
generation techniques with a tunable precision parameter, giving educators
quality control over the feedback that students receive? To this end, we
introduce PyFiXV, our technique to generate high-precision feedback powered by
Codex. The key idea behind PyFiXV is to use a novel run-time validation
mechanism to decide whether the generated feedback is suitable for sharing with
the student; notably, this validation mechanism also provides a precision knob
to educators. We perform an extensive evaluation using two real-world datasets
of Python programs with syntax errors and show the efficacy of PyFiXV in
generating high-precision feedback.
Related papers
- Can Large Language Models Replicate ITS Feedback on Open-Ended Math Questions? [3.7399138244928145]
We study the capabilities of large language models to generate feedback for open-ended math questions.
We find that open-source and proprietary models both show promise in replicating the feedback they see during training, but do not generalize well to previously unseen student errors.
arXiv Detail & Related papers (2024-05-10T11:53:53Z) - Generating Feedback-Ladders for Logical Errors in Programming using Large Language Models [2.1485350418225244]
Large language model (LLM)-based methods have shown great promise in feedback generation for programming assignments.
This paper explores using LLMs to generate a "feedback-ladder", i.e., multiple levels of feedback for the same problem-submission pair.
We evaluate the quality of the generated feedback-ladder via a user study with students, educators, and researchers.
arXiv Detail & Related papers (2024-05-01T03:52:39Z) - Improving the Validity of Automatically Generated Feedback via
Reinforcement Learning [50.067342343957876]
We propose a framework for feedback generation that optimize both correctness and alignment using reinforcement learning (RL)
Specifically, we use GPT-4's annotations to create preferences over feedback pairs in an augmented dataset for training via direct preference optimization (DPO)
arXiv Detail & Related papers (2024-03-02T20:25:50Z) - How Helpful do Novice Programmers Find the Feedback of an Automated
Repair Tool? [1.2990666399718034]
We describe our experience of using CLARA, an automated repair tool, to provide feedback to novices.
First, we extended CLARA to support a larger subset of the Python language, before integrating it with the Jupyter Notebooks used for our programming exercises.
We found that novices often struggled to understand the proposed repairs, echoing the well-known challenge to understand compiler/interpreter messages.
arXiv Detail & Related papers (2023-10-02T07:45:56Z) - Exploring the Potential of Large Language Models to Generate Formative
Programming Feedback [0.5371337604556311]
We explore the potential of large language models (LLMs) for computing educators and learners.
To achieve these goals, we used students' programming sequences from a dataset gathered within a CS1 course as input for ChatGPT.
Results show that ChatGPT performs reasonably well for some of the introductory programming tasks and student errors.
However, educators should provide guidance on how to use the provided feedback, as it can contain misleading information for novices.
arXiv Detail & Related papers (2023-08-31T15:22:11Z) - LeTI: Learning to Generate from Textual Interactions [60.425769582343506]
We explore LMs' potential to learn from textual interactions (LETI) that not only check their correctness with binary labels but also pinpoint and explain errors in their outputs through textual feedback.
Our focus is the code generation task, where the model produces code based on natural language instructions.
LETI iteratively fine-tunes the model, using the objective LM, on a concatenation of natural language instructions, LM-generated programs, and textual feedback.
arXiv Detail & Related papers (2023-05-17T15:53:31Z) - Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation.
We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z) - Improving Code Generation by Training with Natural Language Feedback [69.52985513422381]
We formalize an algorithm for learning from natural language feedback at training time instead, which we call learning from Language Feedback (ILF)
ILF requires only a small amount of human-written feedback during training and does not require the same feedback at test time, making it both user-friendly and sample-efficient.
We use ILF to improve a Codegen-Mono 6.1B model's pass@1 rate by 38% relative (and 10% absolute) on the Mostly Basic Python Problems (MBPP) benchmark.
arXiv Detail & Related papers (2023-03-28T16:15:31Z) - Giving Feedback on Interactive Student Programs with Meta-Exploration [74.5597783609281]
Developing interactive software, such as websites or games, is a particularly engaging way to learn computer science.
Standard approaches require instructors to manually grade student-implemented interactive programs.
Online platforms that serve millions, like Code.org, are unable to provide any feedback on assignments for implementing interactive programs.
arXiv Detail & Related papers (2022-11-16T10:00:23Z) - ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification.
A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors.
Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.