How Far Can We Go with Practical Function-Level Program Repair?
- URL: http://arxiv.org/abs/2404.12833v2
- Date: Thu, 31 Oct 2024 08:54:53 GMT
- Title: How Far Can We Go with Practical Function-Level Program Repair?
- Authors: Jiahong Xiang, Xiaoyang Xu, Fanchu Kong, Mingyuan Wu, Zizheng Zhang, Haotian Zhang, Yuqun Zhang,
- Abstract summary: This paper investigates the effect of few-shot learning mechanism and the auxiliary repair-relevant information on function-level APR.
We propose an LLM-based function-level APR technique, namely SRepair, which adopts a dual-LLM framework to leverage the power of the auxiliary repair-relevant information.
- Score: 11.71750828464698
- License:
- Abstract: Recently, multiple Automated Program Repair (APR) techniques based on Large Language Models (LLMs) have been proposed to enhance the repair performance. While these techniques mainly focus on the single-line or hunk-level repair, they face significant challenges in real-world application due to the limited repair task scope and costly statement-level fault localization. However, the more practical function-level APR, which broadens the scope of APR task to fix entire buggy functions and requires only cost-efficient function-level fault localization, remains underexplored. In this paper, we conduct the first comprehensive study of LLM-based function-level APR including investigating the effect of the few-shot learning mechanism and the auxiliary repair-relevant information. Specifically, we adopt six widely-studied LLMs and construct a benchmark in both the Defects4J 1.2 and 2.0 datasets. Our study demonstrates that LLMs with zero-shot learning are already powerful function-level APR techniques, while applying the few-shot learning mechanism leads to disparate repair performance. Moreover, we find that directly applying the auxiliary repair-relevant information to LLMs significantly increases function-level repair performance. Inspired by our findings, we propose an LLM-based function-level APR technique, namely SRepair, which adopts a dual-LLM framework to leverage the power of the auxiliary repair-relevant information for advancing the repair performance. The evaluation results demonstrate that SRepair can correctly fix 300 single-function bugs in the Defects4J dataset, largely surpassing all previous APR techniques by at least 85%, without the need for the costly statement-level fault location information. Furthermore, SRepair successfully fixes 32 multi-function bugs in the Defects4J dataset, which is the first time achieved by any APR technique ever to our best knowledge.
Related papers
- FastFixer: An Efficient and Effective Approach for Repairing Programming Assignments [21.848112758958543]
We propose FastFixer, an efficient and effective approach for programming assignment repair.
We first propose a novel repair-oriented fine-tuning strategy, aiming to enhance the LLM's attention towards learning how to generate the necessary patch and its associated context.
Considering the repair efficiency, FastFixer achieves a remarkable inference speedup of 16.67 times compared to the autoregressive decoding algorithm.
arXiv Detail & Related papers (2024-10-11T10:17:02Z) - LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement [93.38736019287224]
"LLMs-as-Instructors" framework autonomously enhances the training of smaller target models.
Inspired by the theory of "Learning from Errors", this framework employs an instructor LLM to meticulously analyze the specific errors within a target model.
Within this framework, we implement two strategies: "Learning from Error," which focuses solely on incorrect responses to tailor training data, and "Learning from Error by Contrast", which uses contrastive learning to analyze both correct and incorrect responses for a deeper understanding of errors.
arXiv Detail & Related papers (2024-06-29T17:16:04Z) - CREF: An LLM-based Conversational Software Repair Framework for Programming Tutors [8.415004837059863]
It is crucial to recognize that existing repair benchmarks may have influenced LLM training data, potentially causing data leakage.
Our work assesses the repair performance of 12 LLMs on TutorCode, measuring repair correctness (TOP-5 and AVG-5) and patch precision (RPSR)
To fully harness LLMs' conversational capabilities and the benefits of augmented information, we introduce a novel conversational semi-automatic repair framework CREF assisting human tutor.
arXiv Detail & Related papers (2024-06-20T03:36:34Z) - Supportiveness-based Knowledge Rewriting for Retrieval-augmented Language Modeling [65.72918416258219]
Supportiveness-based Knowledge Rewriting (SKR) is a robust and pluggable knowledge rewriter inherently optimized for LLM generation.
Based on knowledge supportiveness, we first design a training data curation strategy for our rewriter model.
We then introduce the direct preference optimization (DPO) algorithm to align the generated rewrites to optimal supportiveness.
arXiv Detail & Related papers (2024-06-12T11:52:35Z) - Hybrid Automated Program Repair by Combining Large Language Models and Program Analysis [12.7034916462208]
Automated Program Repair (APR) has garnered significant attention due to its potential to streamline the bug repair process for human developers.
This paper introduces an innovative APR approach called GIANTREPAIR.
Based on this insight, GIANTREPAIR first constructs patch skeletons from LLM-generated patches to confine the patch space, and then generates high-quality patches tailored to specific programs.
arXiv Detail & Related papers (2024-06-03T05:05:12Z) - Aligning LLMs for FL-free Program Repair [14.935596175148586]
This paper investigates a new approach to adapt large language models (LLMs) to program repair.
Our core insight is that LLM's APR capability can be greatly improved by simply aligning the output to their training objective.
Based on this insight, we designed D4C, a straightforward prompting framework for APR.
arXiv Detail & Related papers (2024-04-13T02:36:40Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - A Closer Look at the Limitations of Instruction Tuning [52.587607091917214]
We show that Instruction Tuning (IT) fails to enhance knowledge or skills in large language models (LLMs)
We also show that popular methods to improve IT do not lead to performance improvements over a simple LoRA fine-tuned model.
Our findings reveal that responses generated solely from pre-trained knowledge consistently outperform responses by models that learn any form of new knowledge from IT on open-source datasets.
arXiv Detail & Related papers (2024-02-03T04:45:25Z) - The Right Prompts for the Job: Repair Code-Review Defects with Large
Language Model [15.885824575879763]
Automatic program repair (APR) techniques have the potential to reduce manual efforts in uncovering and repairing program defects during the code review (CR) process.
However, the limited accuracy and considerable time costs associated with existing APR approaches hinder their adoption in industrial practice.
Recent advancements in Large Language Models (LLMs) have enhanced their ability to comprehend natural and programming languages, enabling them to generate patches based on review comments.
arXiv Detail & Related papers (2023-12-29T06:12:15Z) - RA-DIT: Retrieval-Augmented Dual Instruction Tuning [90.98423540361946]
Retrieval-augmented language models (RALMs) improve performance by accessing long-tail and up-to-date knowledge from external data stores.
Existing approaches require either expensive retrieval-specific modifications to LM pre-training or use post-hoc integration of the data store that leads to suboptimal performance.
We introduce Retrieval-Augmented Dual Instruction Tuning (RA-DIT), a lightweight fine-tuning methodology that provides a third option.
arXiv Detail & Related papers (2023-10-02T17:16:26Z) - Editing Large Language Models: Problems, Methods, and Opportunities [51.903537096207]
This paper embarks on a deep exploration of the problems, methods, and opportunities related to model editing for LLMs.
We provide an exhaustive overview of the task definition and challenges associated with model editing, along with an in-depth empirical analysis of the most progressive methods currently at our disposal.
Our objective is to provide valuable insights into the effectiveness and feasibility of each editing technique, thereby assisting the community in making informed decisions on the selection of the most appropriate method for a specific task or context.
arXiv Detail & Related papers (2023-05-22T16:00:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.