Related papers: Peer-aided Repairer: Empowering Large Language Models to Repair Advanced Student Assignments

Peer-aided Repairer: Empowering Large Language Models to Repair Advanced Student Assignments

URL: http://arxiv.org/abs/2404.01754v1
Date: Tue, 2 Apr 2024 09:12:21 GMT
Title: Peer-aided Repairer: Empowering Large Language Models to Repair Advanced Student Assignments
Authors: Qianhui Zhao, Fang Liu, Li Zhang, Yang Liu, Zhen Yan, Zhenghao Chen, Yufei Zhou, Jing Jiang, Ge Li,
Abstract summary: We develop a framework called PaR that is powered by the Large Language Model. PaR works in three phases: Peer Solution Selection, Multi-Source Prompt Generation, and Program Repair. The evaluation on Defects4DS and another well-investigated ITSP dataset reveals that PaR achieves a new state-of-the-art performance.
Score: 26.236420215606238
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automated generation of feedback on programming assignments holds significant benefits for programming education, especially when it comes to advanced assignments. Automated Program Repair techniques, especially Large Language Model based approaches, have gained notable recognition for their potential to fix introductory assignments. However, the programs used for evaluation are relatively simple. It remains unclear how existing approaches perform in repairing programs from higher-level programming courses. To address these limitations, we curate a new advanced student assignment dataset named Defects4DS from a higher-level programming course. Subsequently, we identify the challenges related to fixing bugs in advanced assignments. Based on the analysis, we develop a framework called PaR that is powered by the LLM. PaR works in three phases: Peer Solution Selection, Multi-Source Prompt Generation, and Program Repair. Peer Solution Selection identifies the closely related peer programs based on lexical, semantic, and syntactic criteria. Then Multi-Source Prompt Generation adeptly combines multiple sources of information to create a comprehensive and informative prompt for the last Program Repair stage. The evaluation on Defects4DS and another well-investigated ITSP dataset reveals that PaR achieves a new state-of-the-art performance, demonstrating impressive improvements of 19.94% and 15.2% in repair rate compared to prior state-of-the-art LLM- and symbolic-based approaches, respectively

Related papers

Do AI models help produce verified bug fixes? [62.985237003585674]
Large Language Models are used to produce corrections to software bugs.<n>This paper investigates how programmers use Large Language Models to complement their own skills.<n>The results are a first step towards a proper role for AI and LLMs in providing guaranteed-correct fixes to program bugs.
arXiv Detail & Related papers (2025-07-21T17:30:16Z)
Unlocking LLM Repair Capabilities in Low-Resource Programming Languages Through Cross-Language Translation and Multi-Agent Refinement [4.5051492144389504]
We introduce a novel cross-language program repair approach LANTERN. Our approach strategically translates defective code from languages where LLMs exhibit weaker repair capabilities to languages where they demonstrate stronger performance. We evaluate our method on xCodeEval, a comprehensive multilingual benchmark comprising 5,068 bugs across 11 programming languages.
arXiv Detail & Related papers (2025-03-28T15:15:56Z)
LLM Program Optimization via Retrieval Augmented Search [71.40092732256252]
We propose a blackbox adaptation method called Retrieval Augmented Search (RAS) that performs beam search over candidate optimizations. We show that RAS performs 1.8$times$ better than prior state-of-the-art blackbox adaptation strategies. We also propose a method called AEGIS for improving interpretability by decomposing training examples into "atomic edits"
arXiv Detail & Related papers (2025-01-31T06:34:47Z)
Counterexample Guided Program Repair Using Zero-Shot Learning and MaxSAT-based Fault Localization [0.0]
Automated Program Repair (APR) for introductory programming assignments (IPAs) is motivated by the large number of student enrollments. We propose a novel approach that combines the strengths of both FM-based fault localization and Large Language Models (LLMs) Our method uses MaxSAT-based fault localization to identify buggy parts of a program, then presents the LLM with a program sketch devoid of these buggy statements.
arXiv Detail & Related papers (2024-12-19T12:08:44Z)
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization [50.485788083202124]
Reinforcement Learning (RL) plays a crucial role in aligning large language models with human preferences and improving their ability to perform complex tasks. We introduce Direct Q-function Optimization (DQO), which formulates the response generation process as a Markov Decision Process (MDP) and utilizes the soft actor-critic (SAC) framework to optimize a Q-function directly parameterized by the language model. Experimental results on two math problem-solving datasets, GSM8K and MATH, demonstrate that DQO outperforms previous methods, establishing it as a promising offline reinforcement learning approach for aligning language models.
arXiv Detail & Related papers (2024-10-11T23:29:20Z)
FastFixer: An Efficient and Effective Approach for Repairing Programming Assignments [21.848112758958543]
We propose FastFixer, an efficient and effective approach for programming assignment repair. We first propose a novel repair-oriented fine-tuning strategy, aiming to enhance the LLM's attention towards learning how to generate the necessary patch and its associated context. Considering the repair efficiency, FastFixer achieves a remarkable inference speedup of 16.67 times compared to the autoregressive decoding algorithm.
arXiv Detail & Related papers (2024-10-11T10:17:02Z)
Multi-Task Program Error Repair and Explanatory Diagnosis [28.711745671275477]
We present a novel machine-learning approach for Multi-task Program Error Repair and Explanatory Diagnosis (mPRED) A pre-trained language model is used to encode the source code, and a downstream model is specifically designed to identify and repair errors. To aid in visualizing and analyzing the program structure, we use a graph neural network for program structure visualization.
arXiv Detail & Related papers (2024-10-09T05:09:24Z)
RePair: Automated Program Repair with Process-based Feedback [28.017321930042694]
We show how small-scale language models (LM) can achieve excellent performance through process supervision and feedback. We develop a reward model that serves as a critic, providing feedback for the fine-tuned LM's action. The results show that process-based not only outperforms larger outcome-based generation methods, but also nearly matches the performance of closed-source commercial large-scale LMs.
arXiv Detail & Related papers (2024-08-21T02:53:23Z)
Agent-Driven Automatic Software Improvement [55.2480439325792]
This research proposal aims to explore innovative solutions by focusing on the deployment of agents powered by Large Language Models (LLMs) The iterative nature of agents, which allows for continuous learning and adaptation, can help surpass common challenges in code generation. We aim to use the iterative feedback in these systems to further fine-tune the LLMs underlying the agents, becoming better aligned to the task of automated software improvement.
arXiv Detail & Related papers (2024-06-24T15:45:22Z)
Benchmarking Educational Program Repair [4.981275578987307]
Large language models (LLMs) can be used to generate learning resources, improve error messages, and provide feedback on code. There is a pressing need for standardization and benchmarks that facilitate the equitable comparison of competing approaches. In this article, we propose a novel educational program repair benchmark.
arXiv Detail & Related papers (2024-05-08T18:23:59Z)
Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs [58.94569213396991]
We propose a hierarchical programmatic reinforcement learning framework to produce program policies. By learning to compose programs, our proposed framework can produce program policies that describe out-of-distributionally complex behaviors. The experimental results in the Karel domain show that our proposed framework outperforms baselines.
arXiv Detail & Related papers (2023-01-30T14:50:46Z)
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning [92.36705236706678]
"CodeRL" is a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning. During inference, we introduce a new generation procedure with a critical sampling strategy. For the model backbones, we extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives.
arXiv Detail & Related papers (2022-07-05T02:42:15Z)
ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification. A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors. Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z)
Graph-based, Self-Supervised Program Repair from Diagnostic Feedback [108.48853808418725]
We introduce a program-feedback graph, which connects symbols relevant to program repair in source code and diagnostic feedback. We then apply a graph neural network on top to model the reasoning process. We present a self-supervised learning paradigm for program repair that leverages unlabeled programs available online.
arXiv Detail & Related papers (2020-05-20T07:24:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.