DeepDebug: Fixing Python Bugs Using Stack Traces, Backtranslation, and
Code Skeletons
- URL: http://arxiv.org/abs/2105.09352v1
- Date: Wed, 19 May 2021 18:40:16 GMT
- Title: DeepDebug: Fixing Python Bugs Using Stack Traces, Backtranslation, and
Code Skeletons
- Authors: Dawn Drain, Colin B. Clement, Guillermo Serrato, and Neel Sundaresan
- Abstract summary: We present an approach to automated debug using large, pretrained transformers.
We start by training a bug-creation model on reversed commit data for the purpose of generating synthetic bugs.
Next, we focus on 10K repositories for which we can execute tests, and create buggy versions of all functions that are covered by passing tests.
- Score: 5.564793925574796
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The joint task of bug localization and program repair is an integral part of
the software development process. In this work we present DeepDebug, an
approach to automated debugging using large, pretrained transformers. We begin
by training a bug-creation model on reversed commit data for the purpose of
generating synthetic bugs. We apply these synthetic bugs toward two ends.
First, we directly train a backtranslation model on all functions from 200K
repositories. Next, we focus on 10K repositories for which we can execute
tests, and create buggy versions of all functions in those repositories that
are covered by passing tests. This provides us with rich debugging information
such as stack traces and print statements, which we use to finetune our model
which was pretrained on raw source code. Finally, we strengthen all our models
by expanding the context window beyond the buggy function itself, and adding a
skeleton consisting of that function's parent class, imports, signatures,
docstrings, and method bodies, in order of priority. On the QuixBugs benchmark,
we increase the total number of fixes found by over 50%, while also decreasing
the false positive rate from 35% to 5% and decreasing the timeout from six
hours to one minute. On our own benchmark of executable tests, our model fixes
68% of all bugs on its first attempt without using traces, and after adding
traces it fixes 75% on first attempt. We will open-source our framework and
validation set for evaluating on executable tests.
Related papers
- Checker Bug Detection and Repair in Deep Learning Libraries [30.494018435420706]
Checker bugs in Deep Learning (DL) libraries are critical yet not well-explored.
We present the first comprehensive study of DL checker bugs in two widely-used DL libraries.
We propose ZeroGuard, a proof-of-concept JAXGuard-based tool to detect and fix checker bugs in DL libraries.
arXiv Detail & Related papers (2024-10-09T00:48:12Z) - DebugBench: Evaluating Debugging Capability of Large Language Models [80.73121177868357]
DebugBench is a benchmark for Large Language Models (LLMs)
It covers four major bug categories and 18 minor types in C++, Java, and Python.
We evaluate two commercial and four open-source models in a zero-shot scenario.
arXiv Detail & Related papers (2024-01-09T15:46:38Z) - RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic
Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen)
RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs.
We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z) - Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation.
We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z) - Detect-Localize-Repair: A Unified Framework for Learning to Debug with
CodeT5 [14.712753336831172]
We propose a novel unified emphDetect-Localize-Repair framework based on a pretrained programming language model CodeT5.
Our model significantly outperforms existing baselines from both NLP and software engineering domains.
arXiv Detail & Related papers (2022-11-27T16:11:29Z) - DS-1000: A Natural and Reliable Benchmark for Data Science Code
Generation [70.96868419971756]
DS-1000 is a code generation benchmark with a thousand data science problems spanning seven Python libraries.
First, our problems reflect diverse, realistic, and practical use cases since we collected them from StackOverflow.
Second, our automatic evaluation is highly specific (reliable) -- across all Codex-predicted solutions that our evaluation accept, only 1.8% of them are incorrect.
arXiv Detail & Related papers (2022-11-18T17:20:27Z) - On Distribution Shift in Learning-based Bug Detectors [4.511923587827301]
We train a bug detector in two phases, first on a synthetic bug distribution to adapt the model to the bug detection domain, and then on a real bug distribution to drive the model towards the real distribution.
We evaluate our approach extensively on three widely studied bug types, for which we construct new datasets carefully designed to capture the real bug distribution.
arXiv Detail & Related papers (2022-04-21T12:17:22Z) - DapStep: Deep Assignee Prediction for Stack Trace Error rePresentation [61.99379022383108]
We propose new deep learning models to solve the bug triage problem.
The models are based on a bidirectional recurrent neural network with attention and on a convolutional neural network.
To improve the quality of ranking, we propose using additional information from version control system annotations.
arXiv Detail & Related papers (2022-01-14T00:16:57Z) - Break-It-Fix-It: Unsupervised Learning for Program Repair [90.55497679266442]
We propose a new training approach, Break-It-Fix-It (BIFI), which has two key ideas.
We use the critic to check a fixer's output on real bad inputs and add good (fixed) outputs to the training data.
Based on these ideas, we iteratively update the breaker and the fixer while using them in conjunction to generate more paired data.
BIFI outperforms existing methods, obtaining 90.5% repair accuracy on GitHub-Python and 71.7% on DeepFix.
arXiv Detail & Related papers (2021-06-11T20:31:04Z) - Self-Supervised Bug Detection and Repair [27.46717890823656]
We present BugLab, an approach for self-supervised learning of bug detection and repair.
A Python implementation of BugLab improves by up to 30% upon baseline methods on a test dataset of 2374 real-life bugs.
arXiv Detail & Related papers (2021-05-26T18:41:05Z) - Generating Bug-Fixes Using Pretrained Transformers [11.012132897417592]
We introduce a data-driven program repair approach which learns to detect and fix bugs in Java methods mined from real-world GitHub.
We show that pretraining on source code programs improves the number of patches found by 33% as compared to supervised training from scratch.
We refine the standard accuracy evaluation metric into non-deletion and deletion-only fixes, and show that our best model generates 75% more non-deletion fixes than the previous state of the art.
arXiv Detail & Related papers (2021-04-16T05:27:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.