Detect-Localize-Repair: A Unified Framework for Learning to Debug with
CodeT5
- URL: http://arxiv.org/abs/2211.14875v2
- Date: Thu, 1 Dec 2022 17:28:40 GMT
- Title: Detect-Localize-Repair: A Unified Framework for Learning to Debug with
CodeT5
- Authors: Nghi D. Q. Bui, Yue Wang, Steven Hoi
- Abstract summary: We propose a novel unified emphDetect-Localize-Repair framework based on a pretrained programming language model CodeT5.
Our model significantly outperforms existing baselines from both NLP and software engineering domains.
- Score: 14.712753336831172
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automated software debugging is a crucial task for improving the productivity
of software developers. Many neural-based techniques have been proven effective
for debugging-related tasks such as bug localization and program repair (or bug
fixing). However, these techniques often focus only on either one of them or
approach them in a stage-wise manner, ignoring the mutual benefits between
them. In this work, we propose a novel unified \emph{Detect-Localize-Repair}
framework based on a pretrained programming language model CodeT5 to seamlessly
address these tasks, named CodeT5-DLR. Specifically, we propose three
objectives to adapt the generic CodeT5 for debugging: a bug detection objective
to determine whether a given code snippet is buggy or not, a bug localization
objective to identify the buggy lines, and a program repair objective to
translate the buggy code to its fixed version. We evaluate it on each of these
tasks and their combined setting on two newly collected line-level debugging
datasets in Java and Python. Extensive results show that our model
significantly outperforms existing baselines from both NLP and software
engineering domains.
Related papers
- A Deep Dive into Large Language Models for Automated Bug Localization and Repair [12.756202755547024]
Large language models (LLMs) have shown impressive effectiveness in various software engineering tasks, including automated program repair (APR)
In this study, we take a deep dive into automated bug fixing utilizing LLMs.
This methodological separation of bug localization and fixing using different LLMs enables effective integration of diverse contextual information.
Toggle achieves the new state-of-the-art (SOTA) performance on the CodeXGLUE code refinement benchmark.
arXiv Detail & Related papers (2024-04-17T17:48:18Z) - A Novel Approach for Automatic Program Repair using Round-Trip
Translation with Large Language Models [50.86686630756207]
Research shows that grammatical mistakes in a sentence can be corrected by translating it to another language and back.
Current generative models for Automatic Program Repair (APR) are pre-trained on source code and fine-tuned for repair.
This paper proposes bypassing the fine-tuning step and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back.
arXiv Detail & Related papers (2024-01-15T22:36:31Z) - Leveraging Print Debugging to Improve Code Generation in Large Language
Models [63.63160583432348]
Large language models (LLMs) have made significant progress in code generation tasks.
But their performance in tackling programming problems with complex data structures and algorithms remains suboptimal.
We propose an in-context learning approach that guides LLMs to debug by using a "print debug" method.
arXiv Detail & Related papers (2024-01-10T18:37:59Z) - DebugBench: Evaluating Debugging Capability of Large Language Models [80.73121177868357]
DebugBench is a benchmark for Large Language Models (LLMs)
It covers four major bug categories and 18 minor types in C++, Java, and Python.
We evaluate two commercial and four open-source models in a zero-shot scenario.
arXiv Detail & Related papers (2024-01-09T15:46:38Z) - RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic
Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen)
RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs.
We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z) - Neural Program Repair with Program Dependence Analysis and Effective
Filter Mechanism [37.70518599085677]
We present a novel neural program repair framework called approach, which adapts the general pre-trained language model for fixing single-line Java bugs.
We make the first attempt to use program slicing to extract contextual information directly related to the given buggy statement as repair ingredients from the corresponding program dependence graph.
We demonstrate the effectiveness of approach on five benchmarks when compared with state-of-the-art baselines.
arXiv Detail & Related papers (2023-05-16T09:43:04Z) - Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation.
We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z) - BigIssue: A Realistic Bug Localization Benchmark [89.8240118116093]
BigIssue is a benchmark for realistic bug localization.
We provide a general benchmark with a diversity of real and synthetic Java bugs.
We hope to advance the state of the art in bug localization, in turn improving APR performance and increasing its applicability to the modern development cycle.
arXiv Detail & Related papers (2022-07-21T20:17:53Z) - FixEval: Execution-based Evaluation of Program Fixes for Programming
Problems [23.987104440395576]
We introduce FixEval, a benchmark comprising of buggy code submissions to competitive programming problems and their corresponding fixes.
FixEval offers an extensive collection of unit tests to evaluate the correctness of model-generated program fixes.
Our experiments show that match-based metrics do not reflect model-generated program fixes accurately.
arXiv Detail & Related papers (2022-06-15T20:18:43Z) - Generating Bug-Fixes Using Pretrained Transformers [11.012132897417592]
We introduce a data-driven program repair approach which learns to detect and fix bugs in Java methods mined from real-world GitHub.
We show that pretraining on source code programs improves the number of patches found by 33% as compared to supervised training from scratch.
We refine the standard accuracy evaluation metric into non-deletion and deletion-only fixes, and show that our best model generates 75% more non-deletion fixes than the previous state of the art.
arXiv Detail & Related papers (2021-04-16T05:27:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.