DeMuVGN: Effective Software Defect Prediction Model by Learning Multi-view Software Dependency via Graph Neural Networks
- URL: http://arxiv.org/abs/2410.19550v1
- Date: Fri, 25 Oct 2024 13:24:04 GMT
- Title: DeMuVGN: Effective Software Defect Prediction Model by Learning Multi-view Software Dependency via Graph Neural Networks
- Authors: Yu Qiao, Lina Gong, Yu Zhao, Yongwei Wang, Mingqiang Wei,
- Abstract summary: DeMuVGN is a defect prediction model that learns multi-view software dependency via graph neural networks.
We introduce a Multi-view Software Dependency Graph that integrates data, call, and developer dependencies.
In a case study of eight open-source projects across 20 versions, DeMuVGN demonstrates significant improvements.
- Score: 37.928355252723996
- License:
- Abstract: Software defect prediction (SDP) aims to identify high-risk defect modules in software development, optimizing resource allocation. While previous studies show that dependency network metrics improve defect prediction, most methods focus on code-based dependency graphs, overlooking developer factors. Current metrics, based on handcrafted features like ego and global network metrics, fail to fully capture defect-related information. To address this, we propose DeMuVGN, a defect prediction model that learns multi-view software dependency via graph neural networks. We introduce a Multi-view Software Dependency Graph (MSDG) that integrates data, call, and developer dependencies. DeMuVGN also leverages the Synthetic Minority Oversampling Technique (SMOTE) to address class imbalance and enhance defect module identification. In a case study of eight open-source projects across 20 versions, DeMuVGN demonstrates significant improvements: i) models based on multi-view graphs improve F1 scores by 11.1% to 12.1% over single-view models; ii) DeMuVGN improves F1 scores by 17.4% to 45.8% in within-project contexts and by 17.9% to 41.0% in cross-project contexts. Additionally, DeMuVGN excels in software evolution, showing more improvement in later-stage software versions. Its strong performance across different projects highlights its generalizability. We recommend future research focus on multi-view dependency graphs for defect prediction in both mature and newly developed projects.
Related papers
- Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement [62.94719119451089]
Lingma SWE-GPT series learns from and simulating real-world code submission activities.
Lingma SWE-GPT 72B resolves 30.20% of GitHub issues, marking a significant improvement in automatic issue resolution.
arXiv Detail & Related papers (2024-11-01T14:27:16Z) - Imbalanced Graph-Level Anomaly Detection via Counterfactual Augmentation and Feature Learning [1.3756846638796]
We propose an imbalanced GLAD method via counterfactual augmentation and feature learning.
We apply the model to brain disease datasets, which can prove the capability of our work.
arXiv Detail & Related papers (2024-07-13T13:40:06Z) - Opinion-Unaware Blind Image Quality Assessment using Multi-Scale Deep Feature Statistics [54.08757792080732]
We propose integrating deep features from pre-trained visual models with a statistical analysis model to achieve opinion-unaware BIQA (OU-BIQA)
Our proposed model exhibits superior consistency with human visual perception compared to state-of-the-art BIQA models.
arXiv Detail & Related papers (2024-05-29T06:09:34Z) - Deep Incremental Learning of Imbalanced Data for Just-In-Time Software
Defect Prediction [3.2022080692044352]
This work stems from three observations on prior Just-In-Time Software Defect Prediction (JIT-SDP) models.
First, prior studies treat the JIT-SDP problem solely as a classification problem.
Second, prior JIT-SDP studies do not consider that class balancing processing may change the underlying characteristics of software changeset data.
arXiv Detail & Related papers (2023-10-18T19:42:34Z) - Learning Strong Graph Neural Networks with Weak Information [64.64996100343602]
We develop a principled approach to the problem of graph learning with weak information (GLWI)
We propose D$2$PT, a dual-channel GNN framework that performs long-range information propagation on the input graph with incomplete structure, but also on a global graph that encodes global semantic similarities.
arXiv Detail & Related papers (2023-05-29T04:51:09Z) - Graph-Based Machine Learning Improves Just-in-Time Defect Prediction [0.38073142980732994]
We use graph-based machine learning to improve Just-In-Time (JIT) defect prediction.
We show that our best model can predict whether or not a code change will lead to a defect with an F1 score as high as 77.55%.
This represents a 152% higher F1 score and a 3% higher MCC over the state-of-the-art JIT defect prediction.
arXiv Detail & Related papers (2021-10-11T16:00:02Z) - Predicting the Number of Reported Bugs in a Software Repository [0.0]
We examine eight different time series forecasting models, including Long Short Term Memory Neural Networks (LSTM), auto-regressive integrated moving average (ARIMA), and Random Forest Regressor.
We analyze the quality of long-term prediction for each model based on different performance metrics.
The assessment is conducted on Mozilla, which is a large open-source software application.
arXiv Detail & Related papers (2021-04-24T19:06:35Z) - Model-Agnostic Graph Regularization for Few-Shot Learning [60.64531995451357]
We present a comprehensive study on graph embedded few-shot learning.
We introduce a graph regularization approach that allows a deeper understanding of the impact of incorporating graph information between labels.
Our approach improves the performance of strong base learners by up to 2% on Mini-ImageNet and 6.7% on ImageNet-FS.
arXiv Detail & Related papers (2021-02-14T05:28:13Z) - Software Defect Prediction Based On Deep Learning Models: Performance
Study [0.5735035463793008]
Two deep learning models, Stack Sparse Auto-Encoder (SSAE) and Deep Belief Network (DBN) are deployed to classify NASA datasets.
According to the conducted experiment, the accuracy for the datasets with sufficient samples is enhanced.
arXiv Detail & Related papers (2020-04-02T06:02:14Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.