Related papers: Bridging Bug Localization and Issue Fixing: A Hierarchical Localization Framework Leveraging Large Language Models

Bridging Bug Localization and Issue Fixing: A Hierarchical Localization Framework Leveraging Large Language Models

URL: http://arxiv.org/abs/2502.15292v1
Date: Fri, 21 Feb 2025 08:37:02 GMT
Title: Bridging Bug Localization and Issue Fixing: A Hierarchical Localization Framework Leveraging Large Language Models
Authors: Jianming Chang, Xin Zhou, Lulu Wang, David Lo, Bixin Li,
Abstract summary: This paper presents BugCerberus, the first hierarchical bug localization framework powered by three customized large language models.<n>First, BugCerberus analyzes intermediate representations of bug-related programs at file, function, and statement levels and extracts bug-related contextual information from the representations.<n>Second, BugCerberus designs three customized LLMs at each level using bug reports and contexts to learn the patterns of bugs.<n>Finally, BugCerberus hierarchically searches for bug-related code elements through well-tuned models to localize bugs at three levels.
Score: 10.798803883293932
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automated issue fixing is a critical task in software debugging and has recently garnered significant attention from academia and industry. However, existing fixing techniques predominantly focus on the repair phase, often overlooking the importance of improving the preceding bug localization phase. As a foundational step in issue fixing, bug localization plays a pivotal role in determining the overall effectiveness of the entire process. To enhance the precision of issue fixing by accurately identifying bug locations in large-scale projects, this paper presents BugCerberus, the first hierarchical bug localization framework powered by three customized large language models. First, BugCerberus analyzes intermediate representations of bug-related programs at file, function, and statement levels and extracts bug-related contextual information from the representations. Second, BugCerberus designs three customized LLMs at each level using bug reports and contexts to learn the patterns of bugs. Finally, BugCerberus hierarchically searches for bug-related code elements through well-tuned models to localize bugs at three levels. With BugCerberus, we further investigate the impact of bug localization on the issue fixing. We evaluate BugCerberus on the widely-used benchmark SWE-bench-lite. The experimental results demonstrate that BugCerberus outperforms all baselines. Specifically, at the fine-grained statement level, BugCerberus surpasses the state-of-the-art in Top-N (N=1, 3, 5, 10) by 16.5%, 5.4%, 10.2%, and 23.1%, respectively. Moreover, in the issue fixing experiments, BugCerberus improves the fix rate of the existing issue fixing approach Agentless by 17.4% compared to the best baseline, highlighting the significant impact of enhanced bug localization on automated issue fixing.

Related papers

Bug Priority Change Prediction: An Exploratory Study on Apache Software [7.264561489832595]
We propose a novel two-phase bug report priority change prediction method based on bug fixing evolution features and class imbalance handling strategy.<n>To evaluate the performance of our method, we conducted experiments on a bug dataset constructed from 32 non-trivial Apache projects.
arXiv Detail & Related papers (2025-12-10T00:59:51Z)
BugPilot: Complex Bug Generation for Efficient Learning of SWE Skills [59.003563837981886]
High quality bugs are key to training the next generation of language model based software engineering (SWE) agents.<n>We introduce a novel method for synthetic generation of difficult and diverse bugs.
arXiv Detail & Related papers (2025-10-22T17:58:56Z)
Where LLM Agents Fail and How They can Learn From Failures [62.196870049524364]
Large Language Model (LLM) agents have shown promise in solving complex, multi-step tasks.<n>They amplify vulnerability to cascading failures, where a single root-cause error propagates through subsequent decisions.<n>Current systems lack a framework that can comprehensively understand agent error in a modular and systemic way.<n>We introduce the AgentErrorTaxonomy, a modular classification of failure modes spanning memory, reflection, planning, action, and system-level operations.
arXiv Detail & Related papers (2025-09-29T18:20:27Z)
Improved IR-based Bug Localization with Intelligent Relevance Feedback [2.9312156642007294]
Software bugs pose a significant challenge during development and maintenance, and practitioners spend nearly 50% of their time dealing with bugs.<n>Many existing techniques adopt Information Retrieval (IR) to localize a reported bug using textual and semantic relevance between bug reports and source code.<n>We present a novel technique for bug localization - BRaIn - that addresses the contextual gaps by assessing the relevance between bug reports and code.
arXiv Detail & Related papers (2025-01-17T20:29:38Z)
BLAZE: Cross-Language and Cross-Project Bug Localization via Dynamic Chunking and Hard Example Learning [1.9854146581797698]
BLAZE is an approach that employs dynamic chunking and hard example learning. It fine-tunes a GPT-based model using challenging bug cases to enhance cross-project and cross-language bug localization. BLAZE achieves up to an increase of 120% in Top 1 accuracy, 144% in Mean Average Precision (MAP), and 100% in Mean Reciprocal Rank (MRR)
arXiv Detail & Related papers (2024-07-24T20:44:36Z)
Leveraging Stack Traces for Spectrum-based Fault Localization in the Absence of Failing Tests [44.13331329339185]
We introduce a new approach, SBEST, that integrates stack trace data with test coverage to enhance fault localization. Our approach shows a significant improvement, increasing Mean Average Precision (MAP) by 32.22% and Mean Reciprocal Rank (MRR) by 17.43% over traditional stack trace ranking methods.
arXiv Detail & Related papers (2024-05-01T15:15:52Z)
DebugBench: Evaluating Debugging Capability of Large Language Models [80.73121177868357]
DebugBench is a benchmark for Large Language Models (LLMs) It covers four major bug categories and 18 minor types in C++, Java, and Python. We evaluate two commercial and four open-source models in a zero-shot scenario.
arXiv Detail & Related papers (2024-01-09T15:46:38Z)
Automated Bug Generation in the era of Large Language Models [6.0770779409377775]
BugFarm transforms arbitrary code into multiple complex bugs. A comprehensive evaluation of 435k+ bugs from over 1.9M mutants generated by BUGFARM.
arXiv Detail & Related papers (2023-10-03T20:01:51Z)
RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen) RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs. We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z)
WELL: Applying Bug Detectors to Bug Localization via Weakly Supervised Learning [37.09621161662761]
This paper proposes a WEakly supervised bug LocaLization (WELL) method to train a bug localization model. With CodeBERT finetuned on the buggy-or-not binary labeled data, WELL can address bug localization in a weakly supervised manner.
arXiv Detail & Related papers (2023-05-27T06:34:26Z)
Using Developer Discussions to Guide Fixing Bugs in Software [51.00904399653609]
We propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for additional information from developers. We demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.
arXiv Detail & Related papers (2022-11-11T16:37:33Z)
ADPTriage: Approximate Dynamic Programming for Bug Triage [0.0]
We develop a Markov decision process (MDP) model for an online bug triage task. We provide an ADP-based bug triage solution, called ADPTriage, which reflects downstream uncertainty in the bug arrivals and developers' timetables. Our result shows a significant improvement over the myopic approach in terms of assignment accuracy and fixing time.
arXiv Detail & Related papers (2022-11-02T04:42:21Z)
BigIssue: A Realistic Bug Localization Benchmark [89.8240118116093]
BigIssue is a benchmark for realistic bug localization. We provide a general benchmark with a diversity of real and synthetic Java bugs. We hope to advance the state of the art in bug localization, in turn improving APR performance and increasing its applicability to the modern development cycle.
arXiv Detail & Related papers (2022-07-21T20:17:53Z)
DapStep: Deep Assignee Prediction for Stack Trace Error rePresentation [61.99379022383108]
We propose new deep learning models to solve the bug triage problem. The models are based on a bidirectional recurrent neural network with attention and on a convolutional neural network. To improve the quality of ranking, we propose using additional information from version control system annotations.
arXiv Detail & Related papers (2022-01-14T00:16:57Z)
Generating Bug-Fixes Using Pretrained Transformers [11.012132897417592]
We introduce a data-driven program repair approach which learns to detect and fix bugs in Java methods mined from real-world GitHub. We show that pretraining on source code programs improves the number of patches found by 33% as compared to supervised training from scratch. We refine the standard accuracy evaluation metric into non-deletion and deletion-only fixes, and show that our best model generates 75% more non-deletion fixes than the previous state of the art.
arXiv Detail & Related papers (2021-04-16T05:27:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.