Copiloting the Copilots: Fusing Large Language Models with Completion
Engines for Automated Program Repair
- URL: http://arxiv.org/abs/2309.00608v3
- Date: Wed, 8 Nov 2023 22:24:26 GMT
- Title: Copiloting the Copilots: Fusing Large Language Models with Completion
Engines for Automated Program Repair
- Authors: Yuxiang Wei, Chunqiu Steven Xia, Lingming Zhang
- Abstract summary: Large Language Models (LLMs) have been shown to be helpful "copilots" in assisting developers with various coding tasks.
We propose Repilot, a general code generation framework to further copilot the AI "copilots" (i.e., LLMs) by synthesizing more valid patches during the repair process.
Our evaluation on a subset of the widely-used Defects4j 1.2 and 2.0 datasets shows that Repilot outperforms state-of-the-art techniques by fixing 27% and 47% more bugs, respectively.
- Score: 15.391586175711907
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: During Automated Program Repair (APR), it can be challenging to synthesize
correct patches for real-world systems in general-purpose programming
languages. Recent Large Language Models (LLMs) have been shown to be helpful
"copilots" in assisting developers with various coding tasks, and have also
been directly applied for patch synthesis. However, most LLMs treat programs as
sequences of tokens, meaning that they are ignorant of the underlying semantics
constraints of the target programming language. This results in plenty of
statically invalid generated patches, impeding the practicality of the
technique. Therefore, we propose Repilot, a general code generation framework
to further copilot the AI "copilots" (i.e., LLMs) by synthesizing more valid
patches during the repair process. Our key insight is that many LLMs produce
outputs autoregressively (i.e., token by token), resembling human writing
programs, which can be significantly boosted and guided through a Completion
Engine. Repilot synergistically synthesizes a candidate patch through the
interaction between an LLM and a Completion Engine, which 1) prunes away
infeasible tokens suggested by the LLM and 2) proactively completes the token
based on the suggestions provided by the Completion Engine. Our evaluation on a
subset of the widely-used Defects4j 1.2 and 2.0 datasets shows that Repilot
outperforms state-of-the-art techniques by fixing 27% and 47% more bugs,
respectively. Moreover, Repilot produces more valid and correct patches than
the base LLM with the same budget. While we focus on leveraging Repilot for APR
in this work, the overall approach is also generalizable to other code
generation tasks.
Related papers
- MarsCode Agent: AI-native Automated Bug Fixing [7.909344108948294]
We introduce MarsCode Agent, a novel framework that leverages large language models to automatically identify and repair bugs in software code.
Our approach follows a systematic process of planning, bug reproduction, fault localization, candidate patch generation, and validation to ensure high-quality bug fixes.
Our results show that MarsCode Agent achieves a high success rate in bug fixing compared to most of the existing automated approaches.
arXiv Detail & Related papers (2024-09-02T02:24:38Z) - SpecRover: Code Intent Extraction via LLMs [7.742980618437681]
specification inference can be useful for producing high quality program patches.
Our approach SpecRover (AutoCodeRover-v2) is built on the open-source LLM agent AutoCodeRover.
In an evaluation on the full SWE-Bench consisting of 2294 GitHub issues, it shows more than 50% improvement in efficacy over AutoCodeRover.
arXiv Detail & Related papers (2024-08-05T04:53:01Z) - Agent-Driven Automatic Software Improvement [55.2480439325792]
This research proposal aims to explore innovative solutions by focusing on the deployment of agents powered by Large Language Models (LLMs)
The iterative nature of agents, which allows for continuous learning and adaptation, can help surpass common challenges in code generation.
We aim to use the iterative feedback in these systems to further fine-tune the LLMs underlying the agents, becoming better aligned to the task of automated software improvement.
arXiv Detail & Related papers (2024-06-24T15:45:22Z) - Automated Program Repair: Emerging trends pose and expose problems for benchmarks [7.437224586066947]
Large language models (LLMs) are used to generate software patches.
Evaluations and comparisons must take care to ensure that results are valid and likely to generalize.
This is especially true for LLMs, whose large and often poorly-disclosed training datasets may include problems on which they are evaluated.
arXiv Detail & Related papers (2024-05-08T23:09:43Z) - Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language Models [9.454475517867817]
We propose a patch-naturalness measurement, entropy-delta, to improve the efficiency of template-based repair techniques.
Our proposed method can rank correct patches more effectively than state-of-the-art machine learning tools.
arXiv Detail & Related papers (2024-04-23T17:12:45Z) - NExT: Teaching Large Language Models to Reason about Code Execution [50.93581376646064]
Large language models (LLMs) of code are typically trained on the surface textual form of programs.
We propose NExT, a method to teach LLMs to inspect the execution traces of programs and reason about their run-time behavior.
arXiv Detail & Related papers (2024-04-23T01:46:32Z) - A Novel Approach for Automatic Program Repair using Round-Trip
Translation with Large Language Models [50.86686630756207]
Research shows that grammatical mistakes in a sentence can be corrected by translating it to another language and back.
Current generative models for Automatic Program Repair (APR) are pre-trained on source code and fine-tuned for repair.
This paper proposes bypassing the fine-tuning step and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back.
arXiv Detail & Related papers (2024-01-15T22:36:31Z) - Guess & Sketch: Language Model Guided Transpilation [59.02147255276078]
Learned transpilation offers an alternative to manual re-writing and engineering efforts.
Probabilistic neural language models (LMs) produce plausible outputs for every input, but do so at the cost of guaranteed correctness.
Guess & Sketch extracts alignment and confidence information from features of the LM then passes it to a symbolic solver to resolve semantic equivalence.
arXiv Detail & Related papers (2023-09-25T15:42:18Z) - RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic
Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen)
RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs.
We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z) - Conversational Automated Program Repair [10.071615423169902]
We propose a new paradigm for program repair that alternates between patch generation and validation in a conversational manner.
We leverage the long-term context window of Large Pre-Trained Language Models to not only avoid generating previously incorrect patches but also incorporate validation feedback to help the model understand the semantic meaning of the program under test.
arXiv Detail & Related papers (2023-01-30T19:22:36Z) - BigIssue: A Realistic Bug Localization Benchmark [89.8240118116093]
BigIssue is a benchmark for realistic bug localization.
We provide a general benchmark with a diversity of real and synthetic Java bugs.
We hope to advance the state of the art in bug localization, in turn improving APR performance and increasing its applicability to the modern development cycle.
arXiv Detail & Related papers (2022-07-21T20:17:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.