Related papers: An Empirical Study on the Classification of Bug Reports with Machine Learning

An Empirical Study on the Classification of Bug Reports with Machine Learning

URL: http://arxiv.org/abs/2503.00660v1
Date: Sat, 01 Mar 2025 23:19:56 GMT
Title: An Empirical Study on the Classification of Bug Reports with Machine Learning
Authors: Renato Andrade, César Teixeira, Nuno Laranjeiro, Marco Vieira,
Abstract summary: We study how different factors (e.g., project language, report content) can influence the performance of models in handling classification of issue reports.<n>Using the report title or description does not significantly differ; Support Vector Machine, Logistic Regression, and Random Forest are effective in classifying issue reports.<n>Models based on heterogeneous projects can classify reports from projects not present during training.
Score: 1.1499574149885023
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Software defects are a major threat to the reliability of computer systems. The literature shows that more than 30% of bug reports submitted in large software projects are misclassified (i.e., are feature requests, or mistakes made by the bug reporter), leading developers to place great effort in manually inspecting them. Machine Learning algorithms can be used for the automatic classification of issue reports. Still, little is known regarding key aspects of training models, such as the influence of programming languages and issue tracking systems. In this paper, we use a dataset containing more than 660,000 issue reports, collected from heterogeneous projects hosted in different issue tracking systems, to study how different factors (e.g., project language, report content) can influence the performance of models in handling classification of issue reports. Results show that using the report title or description does not significantly differ; Support Vector Machine, Logistic Regression, and Random Forest are effective in classifying issue reports; programming languages and issue tracking systems influence classification outcomes; and models based on heterogeneous projects can classify reports from projects not present during training. Based on findings, we propose guidelines for future research, including recommendations for using heterogeneous data and selecting high-performing algorithms.

Related papers

Automated Bug Report Prioritization in Large Open-Source Projects [3.9134031118910264]
We propose a novel approach to automated bug prioritization based on the natural language text of the bug reports. We conduct topic modeling using a variant of LDA called TopicMiner-MTM and text classification with the BERT large language model. Experimental results using an existing reference dataset containing 85,156 bug reports of the Eclipse Platform project indicate that we outperform existing approaches in terms of Accuracy, Precision, Recall, and F1-measure of the bug report priority prediction.
arXiv Detail & Related papers (2025-04-22T13:57:48Z)
Automated Duplicate Bug Report Detection in Large Open Bug Repositories [3.481985817302898]
Many users and contributors of large open-source projects report software defects or enhancement requests (known as bug reports) to the issue-tracking systems. We propose a novel approach based on machine learning methods that can automatically detect duplicate bug reports in an open bug repository.
arXiv Detail & Related papers (2025-04-21T01:55:54Z)
Aligning Programming Language and Natural Language: Exploring Design Choices in Multi-Modal Transformer-Based Embedding for Bug Localization [0.7564784873669823]
Bug localization refers to the identification of source code files which is in a programming language. Our study evaluated 14 distinct embedding models to gain insights into the effects of various design choices. Our findings indicate that the pre-training strategies significantly affect the quality of the embedding.
arXiv Detail & Related papers (2024-06-25T15:01:39Z)
Towards Weakly-Supervised Hate Speech Classification Across Datasets [47.101942709219784]
We show the effectiveness of a state-of-the-art weakly-supervised text classification model in various in-dataset and cross-dataset settings. We also conduct an in-depth quantitative and qualitative analysis of the source of poor generalizability of HS classification models.
arXiv Detail & Related papers (2023-05-04T08:15:40Z)
Auto-labelling of Bug Report using Natural Language Processing [0.0]
Rule and Query-based solutions recommend a long list of potential similar bug reports with no clear ranking. In this paper, we have proposed a solution using a combination of NLP techniques. It uses a custom data transformer, a deep neural network, and a non-generalizing machine learning method to retrieve existing identical bug reports.
arXiv Detail & Related papers (2022-12-13T02:32:42Z)
Discover, Explanation, Improvement: An Automatic Slice Detection Framework for Natural Language Processing [72.14557106085284]
slice detection models (SDM) automatically identify underperforming groups of datapoints. This paper proposes a benchmark named "Discover, Explain, improve (DEIM)" for classification NLP tasks. Our evaluation shows that Edisa can accurately select error-prone datapoints with informative semantic features.
arXiv Detail & Related papers (2022-11-08T19:00:00Z)
Automatic Classification of Bug Reports Based on Multiple Text Information and Reports' Intention [37.67372105858311]
This paper proposes a new automatic classification method for bug reports. The innovation is that when categorizing bug reports, in addition to using the text information of the report, the intention of the report is also considered. Our proposed method achieves better performance and its F-Measure achieves from 87.3% to 95.5%.
arXiv Detail & Related papers (2022-08-02T06:44:51Z)
Annotation Error Detection: Analyzing the Past and Present for a More Coherent Future [63.99570204416711]
We reimplement 18 methods for detecting potential annotation errors and evaluate them on 9 English datasets. We define a uniform evaluation setup including a new formalization of the annotation error detection task. We release our datasets and implementations in an easy-to-use and open source software package.
arXiv Detail & Related papers (2022-06-05T22:31:45Z)
Understanding Factual Errors in Summarization: Errors, Summarizers, Datasets, Error Detectors [105.12462629663757]
In this work, we aggregate factuality error annotations from nine existing datasets and stratify them according to the underlying summarization model. We compare performance of state-of-the-art factuality metrics, including recent ChatGPT-based metrics, on this stratified benchmark and show that their performance varies significantly across different types of summarization models.
arXiv Detail & Related papers (2022-05-25T15:26:48Z)
An Empirical Investigation of Commonsense Self-Supervision with Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models. We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z)
Automatic Issue Classifier: A Transfer Learning Framework for Classifying Issue Reports [0.0]
We use an off-the-shelf neural network called RoBERTa and finetune it to classify the issue reports. This paper presents our approach to classify the issue reports in a multi-label setting. We use an off-the-shelf neural network called RoBERTa and finetune it to classify the issue reports.
arXiv Detail & Related papers (2022-02-12T21:43:08Z)
DapStep: Deep Assignee Prediction for Stack Trace Error rePresentation [61.99379022383108]
We propose new deep learning models to solve the bug triage problem. The models are based on a bidirectional recurrent neural network with attention and on a convolutional neural network. To improve the quality of ranking, we propose using additional information from version control system annotations.
arXiv Detail & Related papers (2022-01-14T00:16:57Z)
Early Detection of Security-Relevant Bug Reports using Machine Learning: How Far Are We? [6.438136820117887]
In a typical maintenance scenario, security-relevant bug reports are prioritised by the development team when preparing corrective patches. Open security-relevant bug reports can become a critical leak of sensitive information that attackers can leverage to perform zero-day attacks. In recent years, approaches for the detection of security-relevant bug reports based on machine learning have been reported with promising performance.
arXiv Detail & Related papers (2021-12-19T11:30:29Z)
S3M: Siamese Stack (Trace) Similarity Measure [55.58269472099399]
We present S3M -- the first approach to computing stack trace similarity based on deep learning. It is based on a biLSTM encoder and a fully-connected classifier to compute similarity. Our experiments demonstrate the superiority of our approach over the state-of-the-art on both open-sourced data and a private JetBrains dataset.
arXiv Detail & Related papers (2021-03-18T21:10:41Z)
Diverse Complexity Measures for Dataset Curation in Self-driving [80.55417232642124]
We propose a new data selection method that exploits a diverse set of criteria that quantize interestingness of traffic scenes. Our experiments show that the proposed curation pipeline is able to select datasets that lead to better generalization and higher performance.
arXiv Detail & Related papers (2021-01-16T23:45:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.