Related papers: Detect-Localize-Repair: A Unified Framework for Learning to Debug with CodeT5

Detect-Localize-Repair: A Unified Framework for Learning to Debug with CodeT5

URL: http://arxiv.org/abs/2211.14875v2
Date: Thu, 1 Dec 2022 17:28:40 GMT
Title: Detect-Localize-Repair: A Unified Framework for Learning to Debug with CodeT5
Authors: Nghi D. Q. Bui, Yue Wang, Steven Hoi
Abstract summary: We propose a novel unified emphDetect-Localize-Repair framework based on a pretrained programming language model CodeT5. Our model significantly outperforms existing baselines from both NLP and software engineering domains.
Score: 14.712753336831172
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automated software debugging is a crucial task for improving the productivity of software developers. Many neural-based techniques have been proven effective for debugging-related tasks such as bug localization and program repair (or bug fixing). However, these techniques often focus only on either one of them or approach them in a stage-wise manner, ignoring the mutual benefits between them. In this work, we propose a novel unified \emph{Detect-Localize-Repair} framework based on a pretrained programming language model CodeT5 to seamlessly address these tasks, named CodeT5-DLR. Specifically, we propose three objectives to adapt the generic CodeT5 for debugging: a bug detection objective to determine whether a given code snippet is buggy or not, a bug localization objective to identify the buggy lines, and a program repair objective to translate the buggy code to its fixed version. We evaluate it on each of these tasks and their combined setting on two newly collected line-level debugging datasets in Java and Python. Extensive results show that our model significantly outperforms existing baselines from both NLP and software engineering domains.

Related papers

Do AI models help produce verified bug fixes? [62.985237003585674]
Large Language Models are used to produce corrections to software bugs.<n>This paper investigates how programmers use Large Language Models to complement their own skills.<n>The results are a first step towards a proper role for AI and LLMs in providing guaranteed-correct fixes to program bugs.
arXiv Detail & Related papers (2025-07-21T17:30:16Z)
MultiMend: Multilingual Program Repair with Context Augmentation and Multi-Hunk Patch Generation [2.7036595757881323]
MultiMend is a learning-based APR approach designed to improve repair performance on multiple programming languages. It embeds source code lines and applies retrieval-augmented generation to augment the buggy context with relevant lines during patch generation. We evaluate MultiMend on four benchmarks with four programming languages and compare it with state-of-the-art methods.
arXiv Detail & Related papers (2025-01-27T13:37:43Z)
A Deep Dive into Large Language Models for Automated Bug Localization and Repair [12.756202755547024]
Large language models (LLMs) have shown impressive effectiveness in various software engineering tasks, including automated program repair (APR) In this study, we take a deep dive into automated bug fixing utilizing LLMs. This methodological separation of bug localization and fixing using different LLMs enables effective integration of diverse contextual information. Toggle achieves the new state-of-the-art (SOTA) performance on the CodeXGLUE code refinement benchmark.
arXiv Detail & Related papers (2024-04-17T17:48:18Z)
A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language Models [50.86686630756207]
Research shows that grammatical mistakes in a sentence can be corrected by translating it to another language and back. Current generative models for Automatic Program Repair (APR) are pre-trained on source code and fine-tuned for repair. This paper proposes bypassing the fine-tuning step and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back.
arXiv Detail & Related papers (2024-01-15T22:36:31Z)
Leveraging Print Debugging to Improve Code Generation in Large Language Models [63.63160583432348]
Large language models (LLMs) have made significant progress in code generation tasks. But their performance in tackling programming problems with complex data structures and algorithms remains suboptimal. We propose an in-context learning approach that guides LLMs to debug by using a "print debug" method.
arXiv Detail & Related papers (2024-01-10T18:37:59Z)
DebugBench: Evaluating Debugging Capability of Large Language Models [80.73121177868357]
DebugBench is a benchmark for Large Language Models (LLMs) It covers four major bug categories and 18 minor types in C++, Java, and Python. We evaluate two commercial and four open-source models in a zero-shot scenario.
arXiv Detail & Related papers (2024-01-09T15:46:38Z)
RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen) RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs. We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z)
Neural Program Repair with Program Dependence Analysis and Effective Filter Mechanism [37.70518599085677]
We present a novel neural program repair framework called approach, which adapts the general pre-trained language model for fixing single-line Java bugs. We make the first attempt to use program slicing to extract contextual information directly related to the given buggy statement as repair ingredients from the corresponding program dependence graph. We demonstrate the effectiveness of approach on five benchmarks when compared with state-of-the-art baselines.
arXiv Detail & Related papers (2023-05-16T09:43:04Z)
Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation. We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z)
BigIssue: A Realistic Bug Localization Benchmark [89.8240118116093]
BigIssue is a benchmark for realistic bug localization. We provide a general benchmark with a diversity of real and synthetic Java bugs. We hope to advance the state of the art in bug localization, in turn improving APR performance and increasing its applicability to the modern development cycle.
arXiv Detail & Related papers (2022-07-21T20:17:53Z)
FixEval: Execution-based Evaluation of Program Fixes for Programming Problems [23.987104440395576]
We introduce FixEval, a benchmark comprising of buggy code submissions to competitive programming problems and their corresponding fixes. FixEval offers an extensive collection of unit tests to evaluate the correctness of model-generated program fixes. Our experiments show that match-based metrics do not reflect model-generated program fixes accurately.
arXiv Detail & Related papers (2022-06-15T20:18:43Z)
Generating Bug-Fixes Using Pretrained Transformers [11.012132897417592]
We introduce a data-driven program repair approach which learns to detect and fix bugs in Java methods mined from real-world GitHub. We show that pretraining on source code programs improves the number of patches found by 33% as compared to supervised training from scratch. We refine the standard accuracy evaluation metric into non-deletion and deletion-only fixes, and show that our best model generates 75% more non-deletion fixes than the previous state of the art.
arXiv Detail & Related papers (2021-04-16T05:27:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.