Generating Bug-Fixes Using Pretrained Transformers
- URL: http://arxiv.org/abs/2104.07896v1
- Date: Fri, 16 Apr 2021 05:27:04 GMT
- Title: Generating Bug-Fixes Using Pretrained Transformers
- Authors: Dawn Drain, Chen Wu, Alexey Svyatkovskiy, Neel Sundaresan
- Abstract summary: We introduce a data-driven program repair approach which learns to detect and fix bugs in Java methods mined from real-world GitHub.
We show that pretraining on source code programs improves the number of patches found by 33% as compared to supervised training from scratch.
We refine the standard accuracy evaluation metric into non-deletion and deletion-only fixes, and show that our best model generates 75% more non-deletion fixes than the previous state of the art.
- Score: 11.012132897417592
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Detecting and fixing bugs are two of the most important yet frustrating parts
of the software development cycle. Existing bug detection tools are based
mainly on static analyzers, which rely on mathematical logic and symbolic
reasoning about the program execution to detect common types of bugs. Fixing
bugs is typically left out to the developer. In this work we introduce
DeepDebug: a data-driven program repair approach which learns to detect and fix
bugs in Java methods mined from real-world GitHub repositories. We frame
bug-patching as a sequence-to-sequence learning task consisting of two steps:
(i) denoising pretraining, and (ii) supervised finetuning on the target
translation task. We show that pretraining on source code programs improves the
number of patches found by 33% as compared to supervised training from scratch,
while domain-adaptive pretraining from natural language to code further
improves the accuracy by another 32%. We refine the standard accuracy
evaluation metric into non-deletion and deletion-only fixes, and show that our
best model generates 75% more non-deletion fixes than the previous state of the
art. In contrast to prior work, we attain our best results when generating raw
code, as opposed to working with abstracted code that tends to only benefit
smaller capacity models. Finally, we observe a subtle improvement from adding
syntax embeddings along with the standard positional embeddings, as well as
with adding an auxiliary task to predict each token's syntactic class. Despite
focusing on Java, our approach is language agnostic, requiring only a
general-purpose parser such as tree-sitter.
Related papers
- LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback [71.95402654982095]
We propose Math-Minos, a natural language feedback-enhanced verifier.
Our experiments reveal that a small set of natural language feedback can significantly boost the performance of the verifier.
arXiv Detail & Related papers (2024-06-20T06:42:27Z) - RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic
Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen)
RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs.
We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z) - Pre-training Code Representation with Semantic Flow Graph for Effective
Bug Localization [4.159296619915587]
We propose a novel directed, multiple-label code graph representation named Semantic Flow Graph (SFG)
We show that our method achieves state-of-the-art performance in bug localization.
arXiv Detail & Related papers (2023-08-24T13:25:17Z) - A Static Evaluation of Code Completion by Large Language Models [65.18008807383816]
Execution-based benchmarks have been proposed to evaluate functional correctness of model-generated code on simple programming problems.
static analysis tools such as linters, which can detect errors without running the program, haven't been well explored for evaluating code generation models.
We propose a static evaluation framework to quantify static errors in Python code completions, by leveraging Abstract Syntax Trees.
arXiv Detail & Related papers (2023-06-05T19:23:34Z) - Neural Program Repair with Program Dependence Analysis and Effective
Filter Mechanism [37.70518599085677]
We present a novel neural program repair framework called approach, which adapts the general pre-trained language model for fixing single-line Java bugs.
We make the first attempt to use program slicing to extract contextual information directly related to the given buggy statement as repair ingredients from the corresponding program dependence graph.
We demonstrate the effectiveness of approach on five benchmarks when compared with state-of-the-art baselines.
arXiv Detail & Related papers (2023-05-16T09:43:04Z) - Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation.
We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z) - Detect-Localize-Repair: A Unified Framework for Learning to Debug with
CodeT5 [14.712753336831172]
We propose a novel unified emphDetect-Localize-Repair framework based on a pretrained programming language model CodeT5.
Our model significantly outperforms existing baselines from both NLP and software engineering domains.
arXiv Detail & Related papers (2022-11-27T16:11:29Z) - BigIssue: A Realistic Bug Localization Benchmark [89.8240118116093]
BigIssue is a benchmark for realistic bug localization.
We provide a general benchmark with a diversity of real and synthetic Java bugs.
We hope to advance the state of the art in bug localization, in turn improving APR performance and increasing its applicability to the modern development cycle.
arXiv Detail & Related papers (2022-07-21T20:17:53Z) - On Distribution Shift in Learning-based Bug Detectors [4.511923587827301]
We train a bug detector in two phases, first on a synthetic bug distribution to adapt the model to the bug detection domain, and then on a real bug distribution to drive the model towards the real distribution.
We evaluate our approach extensively on three widely studied bug types, for which we construct new datasets carefully designed to capture the real bug distribution.
arXiv Detail & Related papers (2022-04-21T12:17:22Z) - Break-It-Fix-It: Unsupervised Learning for Program Repair [90.55497679266442]
We propose a new training approach, Break-It-Fix-It (BIFI), which has two key ideas.
We use the critic to check a fixer's output on real bad inputs and add good (fixed) outputs to the training data.
Based on these ideas, we iteratively update the breaker and the fixer while using them in conjunction to generate more paired data.
BIFI outperforms existing methods, obtaining 90.5% repair accuracy on GitHub-Python and 71.7% on DeepFix.
arXiv Detail & Related papers (2021-06-11T20:31:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.