An Empirical Study on Bug Severity Estimation using Source Code Metrics and Static Analysis
- URL: http://arxiv.org/abs/2206.12927v2
- Date: Fri, 2 Aug 2024 19:14:32 GMT
- Title: An Empirical Study on Bug Severity Estimation using Source Code Metrics and Static Analysis
- Authors: Ehsan Mashhadi, Shaiful Chowdhury, Somayeh Modaberi, Hadi Hemmati, Gias Uddin,
- Abstract summary: We study 3,358 buggy methods with different severity labels from 19 Java open-source projects.
Results show that code metrics are useful in predicting buggy code, but they cannot estimate the severity level of the bugs.
Our categorization shows that Security bugs have high severity in most cases while Edge/Boundary faults have low severity.
- Score: 0.8621608193534838
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the past couple of decades, significant research efforts have been devoted to the prediction of software bugs (i.e., defects). In general, these works leverage a diverse set of metrics, tools, and techniques to predict which classes, methods, lines, or commits are buggy. However, most existing work in this domain treats all bugs the same, which is not the case in practice. The more severe the bugs the higher their consequences. Therefore, it is important for a defect prediction method to estimate the severity of the identified bugs, so that the higher severity ones get immediate attention. In this paper, we provide a quantitative and qualitative study on two popular datasets (Defects4J and Bugs.jar), using 10 common source code metrics, and two popular static analysis tools (SpotBugs and Infer) for analyzing their capability to predict defects and their severity. We studied 3,358 buggy methods with different severity labels from 19 Java open-source projects. Results show that although code metrics are useful in predicting buggy code (Lines of the Code, Maintainable Index, FanOut, and Effort metrics are the best), they cannot estimate the severity level of the bugs. In addition, we observed that static analysis tools have weak performance in both predicting bugs (F1 score range of 3.1%-7.1%) and their severity label (F1 score under 2%). We also manually studied the characteristics of the severe bugs to identify possible reasons behind the weak performance of code metrics and static analysis tools in estimating their severity. Also, our categorization shows that Security bugs have high severity in most cases while Edge/Boundary faults have low severity. Finally, we discuss the practical implications of the results and propose new directions for future research.
Related papers
- A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection [52.228708947607636]
This paper introduces a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework for new methods.
The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics.
We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z) - Method-Level Bug Severity Prediction using Source Code Metrics and LLMs [0.628122931748758]
We investigate source code metrics, source code representation using large language models (LLMs), and their combination in predicting bug severity labels.
Our results suggest that Decision Tree and Random Forest models outperform other models regarding our several evaluation metrics.
CodeBERT finetuning improves the bug severity prediction results significantly in the range of 29%-140% for several evaluation metrics.
arXiv Detail & Related papers (2023-09-06T14:38:07Z) - Bridging Precision and Confidence: A Train-Time Loss for Calibrating
Object Detection [58.789823426981044]
We propose a novel auxiliary loss formulation that aims to align the class confidence of bounding boxes with the accurateness of predictions.
Our results reveal that our train-time loss surpasses strong calibration baselines in reducing calibration error for both in and out-domain scenarios.
arXiv Detail & Related papers (2023-03-25T08:56:21Z) - ADPTriage: Approximate Dynamic Programming for Bug Triage [0.0]
We develop a Markov decision process (MDP) model for an online bug triage task.
We provide an ADP-based bug triage solution, called ADPTriage, which reflects downstream uncertainty in the bug arrivals and developers' timetables.
Our result shows a significant improvement over the myopic approach in terms of assignment accuracy and fixing time.
arXiv Detail & Related papers (2022-11-02T04:42:21Z) - Infrared: A Meta Bug Detector [10.541969253100815]
We propose a new approach, called meta bug detection, which offers three crucial advantages over existing learning-based bug detectors.
Our evaluation shows our meta bug detector (MBD) is effective in catching a variety of bugs including null pointer dereference, array index out-of-bound, file handle leak, and even data races in concurrent programs.
arXiv Detail & Related papers (2022-09-18T09:08:51Z) - On Distribution Shift in Learning-based Bug Detectors [4.511923587827301]
We train a bug detector in two phases, first on a synthetic bug distribution to adapt the model to the bug detection domain, and then on a real bug distribution to drive the model towards the real distribution.
We evaluate our approach extensively on three widely studied bug types, for which we construct new datasets carefully designed to capture the real bug distribution.
arXiv Detail & Related papers (2022-04-21T12:17:22Z) - Learning to Reduce False Positives in Analytic Bug Detectors [12.733531603080674]
We propose a Transformer-based learning approach to identify false positive bug warnings.
We demonstrate that our models can improve the precision of static analysis by 17.5%.
arXiv Detail & Related papers (2022-03-08T04:26:26Z) - DapStep: Deep Assignee Prediction for Stack Trace Error rePresentation [61.99379022383108]
We propose new deep learning models to solve the bug triage problem.
The models are based on a bidirectional recurrent neural network with attention and on a convolutional neural network.
To improve the quality of ranking, we propose using additional information from version control system annotations.
arXiv Detail & Related papers (2022-01-14T00:16:57Z) - D2A: A Dataset Built for AI-Based Vulnerability Detection Methods Using
Differential Analysis [55.15995704119158]
We propose D2A, a differential analysis based approach to label issues reported by static analysis tools.
We use D2A to generate a large labeled dataset to train models for vulnerability identification.
arXiv Detail & Related papers (2021-02-16T07:46:53Z) - Learning a Unified Sample Weighting Network for Object Detection [113.98404690619982]
Region sampling or weighting is significantly important to the success of modern region-based object detectors.
We argue that sample weighting should be data-dependent and task-dependent.
We propose a unified sample weighting network to predict a sample's task weights.
arXiv Detail & Related papers (2020-06-11T16:19:16Z) - Provable tradeoffs in adversarially robust classification [96.48180210364893]
We develop and leverage new tools, including recent breakthroughs from probability theory on robust isoperimetry.
Our results reveal fundamental tradeoffs between standard and robust accuracy that grow when data is imbalanced.
arXiv Detail & Related papers (2020-06-09T09:58:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.