A Novel Approach for Automatic Program Repair using Round-Trip
Translation with Large Language Models
- URL: http://arxiv.org/abs/2401.07994v1
- Date: Mon, 15 Jan 2024 22:36:31 GMT
- Title: A Novel Approach for Automatic Program Repair using Round-Trip
Translation with Large Language Models
- Authors: Fernando Vallecillos Ruiz and Anastasiia Grishina and Max Hort and
Leon Moonen
- Abstract summary: Research shows that grammatical mistakes in a sentence can be corrected by translating it to another language and back.
Current generative models for Automatic Program Repair (APR) are pre-trained on source code and fine-tuned for repair.
This paper proposes bypassing the fine-tuning step and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back.
- Score: 50.86686630756207
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Research shows that grammatical mistakes in a sentence can be corrected by
translating it to another language and back using neural machine translation
with language models. We investigate whether this correction capability of
Large Language Models (LLMs) extends to Automatic Program Repair (APR). Current
generative models for APR are pre-trained on source code and fine-tuned for
repair. This paper proposes bypassing the fine-tuning step and using Round-Trip
Translation (RTT): translation of code from one programming language to another
programming or natural language, and back. We hypothesize that RTT with LLMs
restores the most commonly seen patterns in code during pre-training, i.e.,
performs a regression toward the mean, which removes bugs as they are a form of
noise w.r.t. the more frequent, natural, bug-free code in the training data. To
test this hypothesis, we employ eight recent LLMs pre-trained on code,
including the latest GPT versions, and four common program repair benchmarks in
Java. We find that RTT with English as an intermediate language repaired 101 of
164 bugs with GPT-4 on the HumanEval-Java dataset. Moreover, 46 of these are
unique bugs that are not repaired by other LLMs fine-tuned for APR. Our
findings highlight the viability of round-trip translation with LLMs as a
technique for automated program repair and its potential for research in
software engineering.
Keywords: automated program repair, large language model, machine translation
Related papers
- Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language Models [9.454475517867817]
We propose a patch-naturalness measurement, entropy-delta, to improve the efficiency of template-based repair techniques.
Our proposed method can rank correct patches more effectively than state-of-the-art machine learning tools.
arXiv Detail & Related papers (2024-04-23T17:12:45Z) - T5APR: Empowering Automated Program Repair across Languages through Checkpoint Ensemble [2.7036595757881323]
We propose T5APR, a novel neural program repair approach that provides a unified solution for bug fixing across multiple programming languages.
T5APR correctly fixes 1,985 bugs, including 1,442 bugs that none of the compared techniques has fixed.
arXiv Detail & Related papers (2023-09-27T15:54:08Z) - Guess & Sketch: Language Model Guided Transpilation [59.02147255276078]
Learned transpilation offers an alternative to manual re-writing and engineering efforts.
Probabilistic neural language models (LMs) produce plausible outputs for every input, but do so at the cost of guaranteed correctness.
Guess & Sketch extracts alignment and confidence information from features of the LM then passes it to a symbolic solver to resolve semantic equivalence.
arXiv Detail & Related papers (2023-09-25T15:42:18Z) - RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic
Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen)
RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs.
We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z) - Enhancing Automated Program Repair through Fine-tuning and Prompt
Engineering [2.3826139428423576]
Sequence-to-sequence models have been used to transform erroneous programs into correct ones when trained with a large enough dataset.
Some recent studies demonstrated strong empirical evidence that code review could improve the program repair further.
We investigate if this inherent knowledge of PL and NL can be utilized to improve automated program repair.
arXiv Detail & Related papers (2023-04-16T17:29:51Z) - Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation.
We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z) - LEVER: Learning to Verify Language-to-Code Generation with Execution [64.36459105535]
We propose LEVER, a simple approach to improve language-to-code generation by learning to verify the generated programs with their execution results.
Specifically, we train verifiers to determine whether a program sampled from the LLMs is correct or not based on the natural language input, the program itself and its execution results.
LEVER consistently improves over the base code LLMs(4.6% to 10.9% with code-davinci) and achieves new state-of-the-art results on all of them.
arXiv Detail & Related papers (2023-02-16T18:23:22Z) - Conversational Automated Program Repair [10.071615423169902]
We propose a new paradigm for program repair that alternates between patch generation and validation in a conversational manner.
We leverage the long-term context window of Large Pre-Trained Language Models to not only avoid generating previously incorrect patches but also incorporate validation feedback to help the model understand the semantic meaning of the program under test.
arXiv Detail & Related papers (2023-01-30T19:22:36Z) - AVATAR: A Parallel Corpus for Java-Python Program Translation [77.86173793901139]
Program translation refers to migrating source code from one language to another.
We present AVATAR, a collection of 9,515 programming problems and their solutions written in two popular languages, Java and Python.
arXiv Detail & Related papers (2021-08-26T05:44:20Z) - ENCORE: Ensemble Learning using Convolution Neural Machine Translation for Automatic Program Repair [7.026028136636735]
We propose ENCORE, a new generate-and-validate (G&V) program repair technique.
It uses ensemble learning on convolutional neural machine translation (NMT) models to automatically fix bugs in multiple programming languages.
ENCORE is the first G&V repair technique to be applied to four popular programming languages.
arXiv Detail & Related papers (2019-06-20T15:25:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.