Related papers: Method-Level Bug Severity Prediction using Source Code Metrics and LLMs

Method-Level Bug Severity Prediction using Source Code Metrics and LLMs

URL: http://arxiv.org/abs/2309.03044v1
Date: Wed, 6 Sep 2023 14:38:07 GMT
Title: Method-Level Bug Severity Prediction using Source Code Metrics and LLMs
Authors: Ehsan Mashhadi, Hossein Ahmadvand, Hadi Hemmati
Abstract summary: We investigate source code metrics, source code representation using large language models (LLMs), and their combination in predicting bug severity labels. Our results suggest that Decision Tree and Random Forest models outperform other models regarding our several evaluation metrics. CodeBERT finetuning improves the bug severity prediction results significantly in the range of 29%-140% for several evaluation metrics.
Score: 0.628122931748758
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In the past couple of decades, significant research efforts are devoted to the prediction of software bugs. However, most existing work in this domain treats all bugs the same, which is not the case in practice. It is important for a defect prediction method to estimate the severity of the identified bugs so that the higher-severity ones get immediate attention. In this study, we investigate source code metrics, source code representation using large language models (LLMs), and their combination in predicting bug severity labels of two prominent datasets. We leverage several source metrics at method-level granularity to train eight different machine-learning models. Our results suggest that Decision Tree and Random Forest models outperform other models regarding our several evaluation metrics. We then use the pre-trained CodeBERT LLM to study the source code representations' effectiveness in predicting bug severity. CodeBERT finetuning improves the bug severity prediction results significantly in the range of 29%-140% for several evaluation metrics, compared to the best classic prediction model on source code metric. Finally, we integrate source code metrics into CodeBERT as an additional input, using our two proposed architectures, which both enhance the CodeBERT model effectiveness.

Related papers

Breaking New Ground in Software Defect Prediction: Introducing Practical and Actionable Metrics with Superior Predictive Power for Enhanced Decision-Making [0.8287206589886879]
This paper explores automated software defect prediction at the method level based on the developers' coding habits.<n>We present a systematic approach to forecasting defect-prone software methods via a human error framework.
arXiv Detail & Related papers (2025-08-06T12:52:13Z)
Position: All Current Generative Fidelity and Diversity Metrics are Flawed [58.815519650465774]
We show that all current generative fidelity and diversity metrics are flawed.<n>Our aim is to convince the research community to spend more effort in developing metrics, instead of models.
arXiv Detail & Related papers (2025-05-28T15:10:33Z)
Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z)
LLM-Based Detection of Tangled Code Changes for Higher-Quality Method-Level Bug Datasets [5.191767648600372]
We investigate the utility of Large Language Models for detecting tangled code changes by leveraging both commit messages and method-level code diffs.<n>Our results demonstrate that combining commit messages with code diffs significantly enhances model performance.<n>Applying our approach to 49 open-source projects improves the distributional separability of code metrics between buggy and non-buggy methods.
arXiv Detail & Related papers (2025-05-13T06:26:13Z)
MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools [54.63478102768333]
Well-calibrated model confidences can be used to weigh the risk versus reward of potential actions. We propose a novel class of model-internal confidence estimators (MICE) to better assess confidence when calling tools.
arXiv Detail & Related papers (2025-04-28T18:06:38Z)
Bug Destiny Prediction in Large Open-Source Software Repositories through Sentiment Analysis and BERT Topic Modeling [3.481985817302898]
We leverage features available before a bug is resolved to enhance predictive accuracy. Our methodology incorporates sentiment analysis to derive both an emotionality score and a sentiment classification. Results demonstrate that sentiment analysis serves as a valuable predictor of a bug's eventual outcome.
arXiv Detail & Related papers (2025-04-22T15:18:14Z)
What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy. By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z)
Enhanced Bug Prediction in JavaScript Programs with Hybrid Call-Graph Based Invocation Metrics [0.7099737083842057]
Bug prediction aims at finding source code elements in a software system that are likely to contain defects. In this paper, we propose a function level JavaScript bug prediction model based on static source code metrics with the addition of a hybrid (static and dynamic) code analysis based metric of the number of incoming and outgoing function calls.
arXiv Detail & Related papers (2024-05-12T10:31:43Z)
Pre-training Code Representation with Semantic Flow Graph for Effective Bug Localization [4.159296619915587]
We propose a novel directed, multiple-label code graph representation named Semantic Flow Graph (SFG) We show that our method achieves state-of-the-art performance in bug localization.
arXiv Detail & Related papers (2023-08-24T13:25:17Z)
A Multi-dimensional Evaluation of Tokenizer-free Multilingual Pretrained Models [87.7086269902562]
We show that subword-based models might still be the most practical choice in many settings. We encourage future work in tokenizer-free methods to consider these factors when designing and evaluating new models.
arXiv Detail & Related papers (2022-10-13T15:47:09Z)
Learning to predict test effectiveness [1.4213973379473652]
This article offers a machine learning model to predict the extent to which the test could cover a class in terms of a new metric called Coverageability. We offer a mathematical model to evaluate test effectiveness in terms of size and coverage of the test suite generated automatically for each class.
arXiv Detail & Related papers (2022-08-20T07:26:59Z)
An Empirical Study on Bug Severity Estimation using Source Code Metrics and Static Analysis [0.8621608193534838]
We study 3,358 buggy methods with different severity labels from 19 Java open-source projects. Results show that code metrics are useful in predicting buggy code, but they cannot estimate the severity level of the bugs. Our categorization shows that Security bugs have high severity in most cases while Edge/Boundary faults have low severity.
arXiv Detail & Related papers (2022-06-26T17:07:23Z)
DapStep: Deep Assignee Prediction for Stack Trace Error rePresentation [61.99379022383108]
We propose new deep learning models to solve the bug triage problem. The models are based on a bidirectional recurrent neural network with attention and on a convolutional neural network. To improve the quality of ranking, we propose using additional information from version control system annotations.
arXiv Detail & Related papers (2022-01-14T00:16:57Z)
Predicting the Number of Reported Bugs in a Software Repository [0.0]
We examine eight different time series forecasting models, including Long Short Term Memory Neural Networks (LSTM), auto-regressive integrated moving average (ARIMA), and Random Forest Regressor. We analyze the quality of long-term prediction for each model based on different performance metrics. The assessment is conducted on Mozilla, which is a large open-source software application.
arXiv Detail & Related papers (2021-04-24T19:06:35Z)
ALT-MAS: A Data-Efficient Framework for Active Testing of Machine Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data. The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z)
Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models. We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs. Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z)
Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples. We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries. We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.