Related papers: Graph-Based Machine Learning Improves Just-in-Time Defect Prediction

Graph-Based Machine Learning Improves Just-in-Time Defect Prediction

URL: http://arxiv.org/abs/2110.05371v3
Date: Fri, 14 Apr 2023 16:02:35 GMT
Title: Graph-Based Machine Learning Improves Just-in-Time Defect Prediction
Authors: Jonathan Bryan and Pablo Moriano
Abstract summary: We use graph-based machine learning to improve Just-In-Time (JIT) defect prediction. We show that our best model can predict whether or not a code change will lead to a defect with an F1 score as high as 77.55%. This represents a 152% higher F1 score and a 3% higher MCC over the state-of-the-art JIT defect prediction.
Score: 0.38073142980732994
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The increasing complexity of today's software requires the contribution of thousands of developers. This complex collaboration structure makes developers more likely to introduce defect-prone changes that lead to software faults. Determining when these defect-prone changes are introduced has proven challenging, and using traditional machine learning (ML) methods to make these determinations seems to have reached a plateau. In this work, we build contribution graphs consisting of developers and source files to capture the nuanced complexity of changes required to build software. By leveraging these contribution graphs, our research shows the potential of using graph-based ML to improve Just-In-Time (JIT) defect prediction. We hypothesize that features extracted from the contribution graphs may be better predictors of defect-prone changes than intrinsic features derived from software characteristics. We corroborate our hypothesis using graph-based ML for classifying edges that represent defect-prone changes. This new framing of the JIT defect prediction problem leads to remarkably better results. We test our approach on 14 open-source projects and show that our best model can predict whether or not a code change will lead to a defect with an F1 score as high as 77.55% and a Matthews correlation coefficient (MCC) as high as 53.16%. This represents a 152% higher F1 score and a 3% higher MCC over the state-of-the-art JIT defect prediction. We describe limitations, open challenges, and how this method can be used for operational JIT defect prediction.

Related papers

Bug Destiny Prediction in Large Open-Source Software Repositories through Sentiment Analysis and BERT Topic Modeling [3.481985817302898]
We leverage features available before a bug is resolved to enhance predictive accuracy. Our methodology incorporates sentiment analysis to derive both an emotionality score and a sentiment classification. Results demonstrate that sentiment analysis serves as a valuable predictor of a bug's eventual outcome.
arXiv Detail & Related papers (2025-04-22T15:18:14Z)
DeMuVGN: Effective Software Defect Prediction Model by Learning Multi-view Software Dependency via Graph Neural Networks [37.928355252723996]
DeMuVGN is a defect prediction model that learns multi-view software dependency via graph neural networks. We introduce a Multi-view Software Dependency Graph that integrates data, call, and developer dependencies. In a case study of eight open-source projects across 20 versions, DeMuVGN demonstrates significant improvements.
arXiv Detail & Related papers (2024-10-25T13:24:04Z)
Code Revert Prediction with Graph Neural Networks: A Case Study at J.P. Morgan Chase [10.961209762486684]
Code revert prediction aims to forecast or predict the likelihood of code changes being reverted or rolled back in software development. Previous methods for code defect detection relied on independent features but ignored relationships between code scripts. This paper presents a systematic empirical study for code revert prediction that integrates the code import graph with code features.
arXiv Detail & Related papers (2024-03-14T15:54:29Z)
Variance of ML-based software fault predictors: are we really improving fault prediction? [0.3222802562733786]
We experimentally analyze the variance of a state-of-the-art fault prediction approach. We observed a maximum variance of 10.10% in terms of the per-class accuracy metric.
arXiv Detail & Related papers (2023-10-26T09:31:32Z)
Uncertainty Quantification over Graph with Conformalized Graph Neural Networks [52.20904874696597]
Graph Neural Networks (GNNs) are powerful machine learning prediction models on graph-structured data. GNNs lack rigorous uncertainty estimates, limiting their reliable deployment in settings where the cost of errors is significant. We propose conformalized GNN (CF-GNN), extending conformal prediction (CP) to graph-based models for guaranteed uncertainty estimates.
arXiv Detail & Related papers (2023-05-23T21:38:23Z)
GIF: A General Graph Unlearning Strategy via Influence Function [63.52038638220563]
Graph Influence Function (GIF) is a model-agnostic unlearning method that can efficiently and accurately estimate parameter changes in response to a $epsilon$-mass perturbation in deleted data. We conduct extensive experiments on four representative GNN models and three benchmark datasets to justify GIF's superiority in terms of unlearning efficacy, model utility, and unlearning efficiency.
arXiv Detail & Related papers (2023-04-06T03:02:54Z)
Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection [58.789823426981044]
We propose a novel auxiliary loss formulation that aims to align the class confidence of bounding boxes with the accurateness of predictions. Our results reveal that our train-time loss surpasses strong calibration baselines in reducing calibration error for both in and out-domain scenarios.
arXiv Detail & Related papers (2023-03-25T08:56:21Z)
IRJIT: A Simple, Online, Information Retrieval Approach for Just-In-Time Software Defect Prediction [10.084626547964389]
Just-in-Time software defect prediction (JIT-SDP) prevents the introduction of defects into the software by identifying them at commit check-in time. Current software defect prediction approaches rely on manually crafted features such as change metrics and involve expensive to train machine learning or deep learning models. We propose an approach called IRJIT that employs information retrieval on source code and labels new commits as buggy or clean based on their similarity to past buggy or clean commits.
arXiv Detail & Related papers (2022-10-05T17:54:53Z)
Defect Prediction Using Stylistic Metrics [2.286041284499166]
This paper aims at analyzing the impact of stylistic metrics on both within-project and crossproject defect prediction. Experiment is conducted on 14 releases of 5 popular, open source projects.
arXiv Detail & Related papers (2022-06-22T10:11:05Z)
Fast and Accurate Error Simulation for CNNs against Soft Errors [64.54260986994163]
We present a framework for the reliability analysis of Conal Neural Networks (CNNs) via an error simulation engine. These error models are defined based on the corruption patterns of the output of the CNN operators induced by faults. We show that our methodology achieves about 99% accuracy of the fault effects w.r.t. SASSIFI, and a speedup ranging from 44x up to 63x w.r.t.FI, that only implements a limited set of error models.
arXiv Detail & Related papers (2022-06-04T19:45:02Z)
A Universal Error Measure for Input Predictions Applied to Online Graph Problems [57.58926849872494]
We introduce a novel measure for quantifying the error in input predictions. The measure captures errors due to absent predicted requests as well as unpredicted actual requests.
arXiv Detail & Related papers (2022-05-25T15:24:03Z)
Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift. We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness. The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.