Related papers: Practical Program Repair in the Era of Large Pre-trained Language Models

Practical Program Repair in the Era of Large Pre-trained Language Models

URL: http://arxiv.org/abs/2210.14179v2
Date: Mon, 09 Dec 2024 20:42:29 GMT
Title: Practical Program Repair in the Era of Large Pre-trained Language Models
Authors: Chunqiu Steven Xia, Yuxiang Wei, Lingming Zhang,
Abstract summary: Automated Program Repair (APR) aims to help developers automatically patch software bugs.<n>PLMs, trained using billions of text/code tokens, can potentially help avoid this issue.<n>We select 9 recent state-of-the-art PLMs, including both generative and infilling models, ranging from 125M to 20B in size.
Score: 13.694803023685175
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automated Program Repair (APR) aims to help developers automatically patch software bugs. However, current state-of-the-art traditional and learning-based APR techniques face the problem of limited patch variety, failing to fix complicated bugs. This is mainly due to the reliance on bug-fixing datasets to craft fix templates or directly predict potential patches. Large Pre-Trained Language Models (PLMs), trained using billions of text/code tokens, can potentially help avoid this issue. Very recently, researchers have directly leveraged PLMs for APR without relying on any bug-fixing datasets. Meanwhile, such existing work either failed to include state-of-the-art PLMs or was not evaluated on realistic datasets. In this work, we perform the first extensive study on directly applying PLMs for APR. We select 9 recent state-of-the-art PLMs, including both generative and infilling models, ranging from 125M to 20B in size. We designed 3 different repair settings to evaluate the different ways we can use PLMs to generate patches. We apply the PLMs under these repair settings on 5 datasets across 3 different languages and compare different PLMs in the number of bugs fixed, generation speed and compilation rate. Our study demonstrates that directly applying state-of-the-art PLMs can already substantially outperform all existing APR techniques on all our datasets. Among the studied PLMs, the scaling effect exists for APR where larger models tend to achieve better performance. Also, we show for the first time that suffix code after the buggy line (adopted in infilling-style APR) is important in not only generating more fixes but more patches with higher compilation rate. Besides patch generation, the PLMs consider correct patches to be more natural than other ones, and can even be leveraged for effective patch ranking or patch correctness checking.

Related papers

Repairs in a Block World: A New Benchmark for Handling User Corrections with Multi-Modal Language Models [48.42142115255159]
We release BlockWorld-Repairs: a dataset of multi-modal TPR sequences in an instruction-following manipulation task. We evaluate several state-of-the-art Vision and Language Models (VLM) across multiple settings, focusing on their capability to process and accurately respond to TPRs. Our results suggest that these models are not yet ready to be deployed in multi-modal collaborative settings.
arXiv Detail & Related papers (2024-09-21T21:06:25Z)
Hybrid Automated Program Repair by Combining Large Language Models and Program Analysis [12.7034916462208]
Automated Program Repair (APR) has garnered significant attention due to its potential to streamline the bug repair process for human developers. This paper introduces an innovative APR approach called GIANTREPAIR. Based on this insight, GIANTREPAIR first constructs patch skeletons from LLM-generated patches to confine the patch space, and then generates high-quality patches tailored to specific programs.
arXiv Detail & Related papers (2024-06-03T05:05:12Z)
Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language Models [9.454475517867817]
We propose a patch-naturalness measurement, entropy-delta, to improve the efficiency of template-based repair techniques. Our proposed method can rank correct patches more effectively than state-of-the-art machine learning tools.
arXiv Detail & Related papers (2024-04-23T17:12:45Z)
Aligning LLMs for FL-free Program Repair [14.935596175148586]
This paper investigates a new approach to adapt large language models (LLMs) to program repair. Our core insight is that LLM's APR capability can be greatly improved by simply aligning the output to their training objective. Based on this insight, we designed D4C, a straightforward prompting framework for APR.
arXiv Detail & Related papers (2024-04-13T02:36:40Z)
A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language Models [50.86686630756207]
Research shows that grammatical mistakes in a sentence can be corrected by translating it to another language and back. Current generative models for Automatic Program Repair (APR) are pre-trained on source code and fine-tuned for repair. This paper proposes bypassing the fine-tuning step and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back.
arXiv Detail & Related papers (2024-01-15T22:36:31Z)
RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen) RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs. We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z)
Self-Checker: Plug-and-Play Modules for Fact-Checking with Large Language Models [75.75038268227554]
Self-Checker is a framework comprising a set of plug-and-play modules that facilitate fact-checking. This framework provides a fast and efficient way to construct fact-checking systems in low-resource environments.
arXiv Detail & Related papers (2023-05-24T01:46:07Z)
Keep the Conversation Going: Fixing 162 out of 337 bugs for $0.42 each using ChatGPT [13.632199062382746]
Automated Program Repair (APR) aims to automatically generate patches for buggy programs. Recent APR work has been focused on leveraging modern Large Language Models (LLMs) to directly generate patches for APR. We propose ChatRepair, the first fully automated conversation-driven APR approach.
arXiv Detail & Related papers (2023-04-01T20:57:33Z)
Revisiting the Plastic Surgery Hypothesis via Large Language Models [13.488029636215089]
We propose FitRepair, which combines the direct usage of Large Language Models with two domain-specific fine-tuning strategies and one prompting strategy for more powerful APR.<n>Our experiments on the widely studied Defects4j 1.2 and 2.0 datasets show that FitRepair fixes 89 and 44 bugs.
arXiv Detail & Related papers (2023-03-18T20:33:46Z)
Prompt Tuning for Discriminative Pre-trained Language Models [96.04765512463415]
Recent works have shown promising results of prompt tuning in stimulating pre-trained language models (PLMs) for natural language processing (NLP) tasks. It is still unknown whether and how discriminative PLMs, e.g., ELECTRA, can be effectively prompt-tuned. We present DPT, the first prompt tuning framework for discriminative PLMs, which reformulates NLP tasks into a discriminative language modeling problem.
arXiv Detail & Related papers (2022-05-23T10:11:50Z)
CPM-2: Large-scale Cost-effective Pre-trained Language Models [71.59893315671997]
We present a suite of cost-effective techniques for the use of PLMs to deal with the efficiency issues of pre-training, fine-tuning, and inference. We introduce knowledge inheritance to accelerate the pre-training process by exploiting existing PLMs instead of training models from scratch. We implement a new inference toolkit, namely InfMoE, for using large-scale PLMs with limited computational resources.
arXiv Detail & Related papers (2021-06-20T15:43:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.