Identifying Root Cause of bugs by Capturing Changed Code Lines with Relational Graph Neural Networks
- URL: http://arxiv.org/abs/2505.00990v1
- Date: Fri, 02 May 2025 04:29:09 GMT
- Title: Identifying Root Cause of bugs by Capturing Changed Code Lines with Relational Graph Neural Networks
- Authors: Jiaqi Zhang, Shikai Guo, Hui Li, Chenchen Li, Yu Chai, Rong Chen,
- Abstract summary: We propose a method called RC-Detection to detect root-cause deletion lines in changed code lines.<n>RC-Detection is used to detect root-cause deletion lines in changed code lines, thereby identifying the root cause of introduced bugs in bug-fixing commits.<n>Our experiments show that, compared to the most advanced root cause detection methods, RC-Detection improved Recall@1, Recall@2, Recall@3, and MFR by at 4.107%, 5.113%, 4.289%, and 24.536%, respectively.
- Score: 7.676213873923721
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Just-In-Time defect prediction model helps development teams improve software quality and efficiency by assessing whether code changes submitted by developers are likely to introduce defects in real-time, allowing timely identification of potential issues during the commit stage. However, two main challenges exist in current work due to the reality that all deleted and added lines in bug-fixing commits may be related to the root cause of the introduced bug: 1) lack of effective integration of heterogeneous graph information, and 2) lack of semantic relationships between changed code lines. To address these challenges, we propose a method called RC-Detection, which utilizes relational graph convolutional network to capture the semantic relationships between changed code lines. RC-Detection is used to detect root-cause deletion lines in changed code lines, thereby identifying the root cause of introduced bugs in bug-fixing commits. To evaluate the effectiveness of RC-Detection, we used three datasets that contain high-quality bug-fixing and bug-introducing commits. Extensive experiments were conducted to evaluate the performance of our model by collecting data from 87 open-source projects, including 675 bug-fix commits. The experimental results show that, compared to the most advanced root cause detection methods, RC-Detection improved Recall@1, Recall@2, Recall@3, and MFR by at 4.107%, 5.113%, 4.289%, and 24.536%, respectively.
Related papers
- BugGen: A Self-Correcting Multi-Agent LLM Pipeline for Realistic RTL Bug Synthesis [1.9291502706655312]
We introduce BugGen, a first of its kind, fully autonomous, multi-agent pipeline to generate, insert, and validate functional bugs in RTL.<n> BugGen partitions modules, selects mutation targets via a closed-loop agentic architecture, and employs iterative refinement and rollback mechanisms.<n> evaluated across five OpenTitan IP blocks, BugGen produced 500 unique bugs with 94% functional accuracy and achieved a throughput of 17.7 validated bugs per hour-over five times faster than typical manual expert insertion.
arXiv Detail & Related papers (2025-06-12T09:02:20Z) - Towards Understanding Bugs in Distributed Training and Inference Frameworks for Large Language Models [7.486731499255164]
This paper conducts the first large-scale empirical analysis of 308 fixed bugs across three popular distributed training/inference frameworks: DeepSpeed, Megatron-LM, and Colossal-AI.<n>We examine bug symptoms, root causes, bug identification and fixing efforts, and common low-effort fixing strategies.
arXiv Detail & Related papers (2025-06-12T07:24:59Z) - KARE-RAG: Knowledge-Aware Refinement and Enhancement for RAG [63.82127103851471]
Retrieval-Augmented Generation (RAG) enables large language models to access broader knowledge sources.<n>We demonstrate that enhancing generative models' capacity to process noisy content is equally critical for robust performance.<n>We present KARE-RAG, which improves knowledge utilization through three key innovations.
arXiv Detail & Related papers (2025-06-03T06:31:17Z) - LLM-Based Detection of Tangled Code Changes for Higher-Quality Method-Level Bug Datasets [5.191767648600372]
We investigate the utility of Large Language Models for detecting tangled code changes by leveraging both commit messages and method-level code diffs.<n>Our results demonstrate that combining commit messages with code diffs significantly enhances model performance.<n>Applying our approach to 49 open-source projects improves the distributional separability of code metrics between buggy and non-buggy methods.
arXiv Detail & Related papers (2025-05-13T06:26:13Z) - Detecting the Root Cause Code Lines in Bug-Fixing Commits by Heterogeneous Graph Learning [3.6066079349976614]
Automated defect prediction tools can proactively identify software changes prone to defects within software projects.<n>Existing work in heterogeneous and complex software projects continues to face challenges, such as struggling with heterogeneous commit structures and ignoring cross-line dependencies in code changes.<n>We propose an approach called RC_Detector, which consists of three main components: the bug-fixing graph construction component, the code semantic aggregation component, and the cross-line semantic retention component.
arXiv Detail & Related papers (2025-05-02T05:39:50Z) - Unveiling Pitfalls: Understanding Why AI-driven Code Agents Fail at GitHub Issue Resolution [22.03052751722933]
Python execution errors during the issue resolution phase correlate with lower resolution rates and increased reasoning overheads.<n>We have identified the most prevalent errors -- such as ModuleNotFoundError and TypeError -- and highlighted particularly challenging errors like OSError and database-related issues.
arXiv Detail & Related papers (2025-03-16T06:24:51Z) - LLMs as Continuous Learners: Improving the Reproduction of Defective Code in Software Issues [62.12404317786005]
EvoCoder is a continuous learning framework for issue code reproduction.
Our results show a 20% improvement in issue reproduction rates over existing SOTA methods.
arXiv Detail & Related papers (2024-11-21T08:49:23Z) - CITADEL: Context Similarity Based Deep Learning Framework Bug Finding [36.34154201748415]
Existing deep learning (DL) framework testing tools have limited coverage on bug types.<n>We propose Citadel, a method that accelerates the finding of bugs in terms of efficiency and effectiveness.
arXiv Detail & Related papers (2024-06-18T01:51:16Z) - Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models [102.72940700598055]
In reasoning tasks, even a minor error can cascade into inaccurate results.
We develop a method that avoids introducing external resources, relying instead on perturbations to the input.
Our training approach randomly masks certain tokens within the chain of thought, a technique we found to be particularly effective for reasoning tasks.
arXiv Detail & Related papers (2024-03-04T16:21:54Z) - Using Developer Discussions to Guide Fixing Bugs in Software [51.00904399653609]
We propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for additional information from developers.
We demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.
arXiv Detail & Related papers (2022-11-11T16:37:33Z) - DapStep: Deep Assignee Prediction for Stack Trace Error rePresentation [61.99379022383108]
We propose new deep learning models to solve the bug triage problem.
The models are based on a bidirectional recurrent neural network with attention and on a convolutional neural network.
To improve the quality of ranking, we propose using additional information from version control system annotations.
arXiv Detail & Related papers (2022-01-14T00:16:57Z) - TACRED Revisited: A Thorough Evaluation of the TACRED Relation
Extraction Task [80.38130122127882]
TACRED is one of the largest, most widely used crowdsourced datasets in Relation Extraction (RE)
In this paper, we investigate the questions: Have we reached a performance ceiling or is there still room for improvement?
We find that label errors account for 8% absolute F1 test error, and that more than 50% of the examples need to be relabeled.
arXiv Detail & Related papers (2020-04-30T15:07:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.