Bug Destiny Prediction in Large Open-Source Software Repositories through Sentiment Analysis and BERT Topic Modeling
- URL: http://arxiv.org/abs/2504.15972v1
- Date: Tue, 22 Apr 2025 15:18:14 GMT
- Title: Bug Destiny Prediction in Large Open-Source Software Repositories through Sentiment Analysis and BERT Topic Modeling
- Authors: Sophie C. Pope, Andrew Barovic, Armin Moin,
- Abstract summary: We leverage features available before a bug is resolved to enhance predictive accuracy.<n>Our methodology incorporates sentiment analysis to derive both an emotionality score and a sentiment classification.<n>Results demonstrate that sentiment analysis serves as a valuable predictor of a bug's eventual outcome.
- Score: 3.481985817302898
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This study explores a novel approach to predicting key bug-related outcomes, including the time to resolution, time to fix, and ultimate status of a bug, using data from the Bugzilla Eclipse Project. Specifically, we leverage features available before a bug is resolved to enhance predictive accuracy. Our methodology incorporates sentiment analysis to derive both an emotionality score and a sentiment classification (positive or negative). Additionally, we integrate the bug's priority level and its topic, extracted using a BERTopic model, as features for a Convolutional Neural Network (CNN) and a Multilayer Perceptron (MLP). Our findings indicate that the combination of BERTopic and sentiment analysis can improve certain model performance metrics. Furthermore, we observe that balancing model inputs enhances practical applicability, albeit at the cost of a significant reduction in accuracy in most cases. To address our primary objectives, predicting time-to-resolution, time-to-fix, and bug destiny, we employ both binary classification and exact time value predictions, allowing for a comparative evaluation of their predictive effectiveness. Results demonstrate that sentiment analysis serves as a valuable predictor of a bug's eventual outcome, particularly in determining whether it will be fixed. However, its utility is less pronounced when classifying bugs into more complex or unconventional outcome categories.
Related papers
- Method-Level Bug Severity Prediction using Source Code Metrics and LLMs [0.628122931748758]
We investigate source code metrics, source code representation using large language models (LLMs), and their combination in predicting bug severity labels.
Our results suggest that Decision Tree and Random Forest models outperform other models regarding our several evaluation metrics.
CodeBERT finetuning improves the bug severity prediction results significantly in the range of 29%-140% for several evaluation metrics.
arXiv Detail & Related papers (2023-09-06T14:38:07Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - Bridging Precision and Confidence: A Train-Time Loss for Calibrating
Object Detection [58.789823426981044]
We propose a novel auxiliary loss formulation that aims to align the class confidence of bounding boxes with the accurateness of predictions.
Our results reveal that our train-time loss surpasses strong calibration baselines in reducing calibration error for both in and out-domain scenarios.
arXiv Detail & Related papers (2023-03-25T08:56:21Z) - Generalizability Analysis of Graph-based Trajectory Predictor with
Vectorized Representation [29.623692599892365]
Trajectory prediction is one of the essential tasks for autonomous vehicles.
Recent progress in machine learning gave birth to a series of advanced trajectory prediction algorithms.
arXiv Detail & Related papers (2022-08-06T20:19:52Z) - VisFIS: Visual Feature Importance Supervision with
Right-for-the-Right-Reason Objectives [84.48039784446166]
We show that model FI supervision can meaningfully improve VQA model accuracy as well as performance on several Right-for-the-Right-Reason metrics.
Our best performing method, Visual Feature Importance Supervision (VisFIS), outperforms strong baselines on benchmark VQA datasets.
Predictions are more accurate when explanations are plausible and faithful, and not when they are plausible but not faithful.
arXiv Detail & Related papers (2022-06-22T17:02:01Z) - Defect Prediction Using Stylistic Metrics [2.286041284499166]
This paper aims at analyzing the impact of stylistic metrics on both within-project and crossproject defect prediction.
Experiment is conducted on 14 releases of 5 popular, open source projects.
arXiv Detail & Related papers (2022-06-22T10:11:05Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Predicting the Number of Reported Bugs in a Software Repository [0.0]
We examine eight different time series forecasting models, including Long Short Term Memory Neural Networks (LSTM), auto-regressive integrated moving average (ARIMA), and Random Forest Regressor.
We analyze the quality of long-term prediction for each model based on different performance metrics.
The assessment is conducted on Mozilla, which is a large open-source software application.
arXiv Detail & Related papers (2021-04-24T19:06:35Z) - Improving Uncertainty Calibration via Prior Augmented Data [56.88185136509654]
Neural networks have proven successful at learning from complex data distributions by acting as universal function approximators.
They are often overconfident in their predictions, which leads to inaccurate and miscalibrated probabilistic predictions.
We propose a solution by seeking out regions of feature space where the model is unjustifiably overconfident, and conditionally raising the entropy of those predictions towards that of the prior distribution of the labels.
arXiv Detail & Related papers (2021-02-22T07:02:37Z) - Curse of Small Sample Size in Forecasting of the Active Cases in
COVID-19 Outbreak [0.0]
During the COVID-19 pandemic, a massive number of attempts on the predictions of the number of cases and the other future trends of this pandemic have been made.
However, they fail to predict, in a reliable way, the medium and long term evolution of fundamental features of COVID-19 outbreak within acceptable accuracy.
This paper gives an explanation for the failure of machine learning models in this particular forecasting problem.
arXiv Detail & Related papers (2020-11-06T23:13:34Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z) - Ambiguity in Sequential Data: Predicting Uncertain Futures with
Recurrent Models [110.82452096672182]
We propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data.
We also introduce a novel metric for ambiguous problems, which is better suited to account for uncertainties.
arXiv Detail & Related papers (2020-03-10T09:15:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.