Predicting the Number of Reported Bugs in a Software Repository
- URL: http://arxiv.org/abs/2104.12001v1
- Date: Sat, 24 Apr 2021 19:06:35 GMT
- Title: Predicting the Number of Reported Bugs in a Software Repository
- Authors: Hadi Jahanshahi, Mucahit Cevik, Ay\c{s}e Ba\c{s}ar
- Abstract summary: We examine eight different time series forecasting models, including Long Short Term Memory Neural Networks (LSTM), auto-regressive integrated moving average (ARIMA), and Random Forest Regressor.
We analyze the quality of long-term prediction for each model based on different performance metrics.
The assessment is conducted on Mozilla, which is a large open-source software application.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The bug growth pattern prediction is a complicated, unrelieved task, which
needs considerable attention. Advance knowledge of the likely number of bugs
discovered in the software system helps software developers in designating
sufficient resources at a convenient time. The developers may also use such
information to take necessary actions to increase the quality of the system and
in turn customer satisfaction. In this study, we examine eight different time
series forecasting models, including Long Short Term Memory Neural Networks
(LSTM), auto-regressive integrated moving average (ARIMA), and Random Forest
Regressor. Further, we assess the impact of exogenous variables such as
software release dates by incorporating those into the prediction models. We
analyze the quality of long-term prediction for each model based on different
performance metrics. The assessment is conducted on Mozilla, which is a large
open-source software application. The dataset is originally mined from Bugzilla
and contains the number of bugs for the project between Jan 2010 and Dec 2019.
Our numerical analysis provides insights on evaluating the trends in a bug
repository. We observe that LSTM is effective when considering long-run
predictions whereas Random Forest Regressor enriched by exogenous variables
performs better for predicting the number of bugs in the short term.
Related papers
- Learning Augmentation Policies from A Model Zoo for Time Series Forecasting [58.66211334969299]
We introduce AutoTSAug, a learnable data augmentation method based on reinforcement learning.
By augmenting the marginal samples with a learnable policy, AutoTSAug substantially improves forecasting performance.
arXiv Detail & Related papers (2024-09-10T07:34:19Z) - Rating Multi-Modal Time-Series Forecasting Models (MM-TSFM) for Robustness Through a Causal Lens [10.103561529332184]
We focus on multi-modal time-series forecasting, where imprecision due to noisy or incorrect data can lead to erroneous predictions.
We introduce a rating methodology to assess the robustness of Multi-Modal Time-Series Forecasting Models.
arXiv Detail & Related papers (2024-06-12T17:39:16Z) - Performative Time-Series Forecasting [71.18553214204978]
We formalize performative time-series forecasting (PeTS) from a machine-learning perspective.
We propose a novel approach, Feature Performative-Shifting (FPS), which leverages the concept of delayed response to anticipate distribution shifts.
We conduct comprehensive experiments using multiple time-series models on COVID-19 and traffic forecasting tasks.
arXiv Detail & Related papers (2023-10-09T18:34:29Z) - Method-Level Bug Severity Prediction using Source Code Metrics and LLMs [0.628122931748758]
We investigate source code metrics, source code representation using large language models (LLMs), and their combination in predicting bug severity labels.
Our results suggest that Decision Tree and Random Forest models outperform other models regarding our several evaluation metrics.
CodeBERT finetuning improves the bug severity prediction results significantly in the range of 29%-140% for several evaluation metrics.
arXiv Detail & Related papers (2023-09-06T14:38:07Z) - Backward-Compatible Prediction Updates: A Probabilistic Approach [12.049279991559091]
We formalize the Prediction Update Problem and present an efficient probabilistic approach as answer to the above questions.
In extensive experiments on standard classification benchmark data sets, we show that our method outperforms alternative strategies for backward-compatible prediction updates.
arXiv Detail & Related papers (2021-07-02T13:05:31Z) - Back2Future: Leveraging Backfill Dynamics for Improving Real-time
Predictions in Future [73.03458424369657]
In real-time forecasting in public health, data collection is a non-trivial and demanding task.
'Backfill' phenomenon and its effect on model performance has been barely studied in the prior literature.
We formulate a novel problem and neural framework Back2Future that aims to refine a given model's predictions in real-time.
arXiv Detail & Related papers (2021-06-08T14:48:20Z) - Robust and Transferable Anomaly Detection in Log Data using Pre-Trained
Language Models [59.04636530383049]
Anomalies or failures in large computer systems, such as the cloud, have an impact on a large number of users.
We propose a framework for anomaly detection in log data, as a major troubleshooting source of system information.
arXiv Detail & Related papers (2021-02-23T09:17:05Z) - Curse of Small Sample Size in Forecasting of the Active Cases in
COVID-19 Outbreak [0.0]
During the COVID-19 pandemic, a massive number of attempts on the predictions of the number of cases and the other future trends of this pandemic have been made.
However, they fail to predict, in a reliable way, the medium and long term evolution of fundamental features of COVID-19 outbreak within acceptable accuracy.
This paper gives an explanation for the failure of machine learning models in this particular forecasting problem.
arXiv Detail & Related papers (2020-11-06T23:13:34Z) - Software Defect Prediction Based On Deep Learning Models: Performance
Study [0.5735035463793008]
Two deep learning models, Stack Sparse Auto-Encoder (SSAE) and Deep Belief Network (DBN) are deployed to classify NASA datasets.
According to the conducted experiment, the accuracy for the datasets with sufficient samples is enhanced.
arXiv Detail & Related papers (2020-04-02T06:02:14Z) - Ambiguity in Sequential Data: Predicting Uncertain Futures with
Recurrent Models [110.82452096672182]
We propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data.
We also introduce a novel metric for ambiguous problems, which is better suited to account for uncertainties.
arXiv Detail & Related papers (2020-03-10T09:15:42Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.