Related papers: Automatic Classification of Bug Reports Based on Multiple Text Information and Reports' Intention

Automatic Classification of Bug Reports Based on Multiple Text Information and Reports' Intention

URL: http://arxiv.org/abs/2208.01274v1
Date: Tue, 2 Aug 2022 06:44:51 GMT
Title: Automatic Classification of Bug Reports Based on Multiple Text Information and Reports' Intention
Authors: Fanqi Meng, Xuesong Wang, Jingdong Wang and Peifang Wang
Abstract summary: This paper proposes a new automatic classification method for bug reports. The innovation is that when categorizing bug reports, in addition to using the text information of the report, the intention of the report is also considered. Our proposed method achieves better performance and its F-Measure achieves from 87.3% to 95.5%.
Score: 37.67372105858311
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the rapid growth of software scale and complexity, a large number of bug reports are submitted to the bug tracking system. In order to speed up defect repair, these reports need to be accurately classified so that they can be sent to the appropriate developers. However, the existing classification methods only use the text information of the bug report, which leads to their low performance. To solve the above problems, this paper proposes a new automatic classification method for bug reports. The innovation is that when categorizing bug reports, in addition to using the text information of the report, the intention of the report (i.e. suggestion or explanation) is also considered, thereby improving the performance of the classification. First, we collect bug reports from four ecosystems (Apache, Eclipse, Gentoo, Mozilla) and manually annotate them to construct an experimental data set. Then, we use Natural Language Processing technology to preprocess the data. On this basis, BERT and TF-IDF are used to extract the features of the intention and the multiple text information. Finally, the features are used to train the classifiers. The experimental result on five classifiers (including K-Nearest Neighbor, Naive Bayes, Logistic Regression, Support Vector Machine, and Random Forest) show that our proposed method achieves better performance and its F-Measure achieves from 87.3% to 95.5%.

Related papers

Automated Duplicate Bug Report Detection in Large Open Bug Repositories [3.481985817302898]
Many users and contributors of large open-source projects report software defects or enhancement requests (known as bug reports) to the issue-tracking systems. We propose a novel approach based on machine learning methods that can automatically detect duplicate bug reports in an open bug repository.
arXiv Detail & Related papers (2025-04-21T01:55:54Z)
Buggin: Automatic intrinsic bugs classification model using NLP and ML [0.0]
This paper employs Natural Language Processing (NLP) techniques to automatically identify intrinsic bugs. We use two embedding techniques, seBERT and TF-IDF, applied to the title and description text of bug reports. The resulting embeddings are fed into well-established machine learning algorithms such as Support Vector Machine, Logistic Regression, Decision Tree, Random Forest, and K-Nearest Neighbors.
arXiv Detail & Related papers (2025-04-02T16:23:08Z)
Understanding the Impact of Domain Term Explanation on Duplicate Bug Report Detection [2.9312156642007294]
Duplicate bug reports make up 42% of all reports in bug tracking systems (e.g., Bugzilla) Traditional techniques often focus on detecting textually similar duplicates. About 78% of bug reports in open-source projects are very short (e.g., less than 100 words) often containing domain-specific terms or jargon.
arXiv Detail & Related papers (2025-03-24T16:09:37Z)
Tgea: An error-annotated dataset and benchmark tasks for text generation from pretrained language models [57.758735361535486]
TGEA is an error-annotated dataset for text generation from pretrained language models (PLMs) We create an error taxonomy to cover 24 types of errors occurring in PLM-generated sentences. This is the first dataset with comprehensive annotations for PLM-generated texts.
arXiv Detail & Related papers (2025-03-06T09:14:02Z)
An Empirical Study on the Classification of Bug Reports with Machine Learning [1.1499574149885023]
We study how different factors (e.g., project language, report content) can influence the performance of models in handling classification of issue reports. Using the report title or description does not significantly differ; Support Vector Machine, Logistic Regression, and Random Forest are effective in classifying issue reports. Models based on heterogeneous projects can classify reports from projects not present during training.
arXiv Detail & Related papers (2025-03-01T23:19:56Z)
SEDAC: A CVAE-Based Data Augmentation Method for Security Bug Report Identification [0.0]
In the real world, the ratio of security bug reports is severely low. SEDAC is a new SBR identification method that generates similar bug report vectors. It outperforms all the baselines in g-measure with improvements of around 14.24%-50.10%.
arXiv Detail & Related papers (2024-01-22T15:53:52Z)
On Using GUI Interaction Data to Improve Text Retrieval-based Bug Localization [10.717184444794505]
We investigate the hypothesis that, for end user-facing applications, connecting information in a bug report with information from the GUI, can improve upon existing techniques for bug localization. We source the current largest dataset of fully-localized and reproducible real bugs for Android apps, with corresponding bug reports.
arXiv Detail & Related papers (2023-10-12T07:14:22Z)
A Comparative Study of Text Embedding Models for Semantic Text Similarity in Bug Reports [0.0]
Retrieving similar bug reports from an existing database can help reduce the time and effort required to resolve bugs. We explored several embedding models such as TF-IDF (Baseline), FastText, Gensim, BERT, and ADA. Our study provides insights into the effectiveness of different embedding methods for retrieving similar bug reports and highlights the impact of selecting the appropriate one for this task.
arXiv Detail & Related papers (2023-08-17T21:36:56Z)
Auto-labelling of Bug Report using Natural Language Processing [0.0]
Rule and Query-based solutions recommend a long list of potential similar bug reports with no clear ranking. In this paper, we have proposed a solution using a combination of NLP techniques. It uses a custom data transformer, a deep neural network, and a non-generalizing machine learning method to retrieve existing identical bug reports.
arXiv Detail & Related papers (2022-12-13T02:32:42Z)
Using Developer Discussions to Guide Fixing Bugs in Software [51.00904399653609]
We propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for additional information from developers. We demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.
arXiv Detail & Related papers (2022-11-11T16:37:33Z)
Annotation Error Detection: Analyzing the Past and Present for a More Coherent Future [63.99570204416711]
We reimplement 18 methods for detecting potential annotation errors and evaluate them on 9 English datasets. We define a uniform evaluation setup including a new formalization of the annotation error detection task. We release our datasets and implementations in an easy-to-use and open source software package.
arXiv Detail & Related papers (2022-06-05T22:31:45Z)
Automatic Issue Classifier: A Transfer Learning Framework for Classifying Issue Reports [0.0]
We use an off-the-shelf neural network called RoBERTa and finetune it to classify the issue reports. This paper presents our approach to classify the issue reports in a multi-label setting. We use an off-the-shelf neural network called RoBERTa and finetune it to classify the issue reports.
arXiv Detail & Related papers (2022-02-12T21:43:08Z)
DapStep: Deep Assignee Prediction for Stack Trace Error rePresentation [61.99379022383108]
We propose new deep learning models to solve the bug triage problem. The models are based on a bidirectional recurrent neural network with attention and on a convolutional neural network. To improve the quality of ranking, we propose using additional information from version control system annotations.
arXiv Detail & Related papers (2022-01-14T00:16:57Z)
S3M: Siamese Stack (Trace) Similarity Measure [55.58269472099399]
We present S3M -- the first approach to computing stack trace similarity based on deep learning. It is based on a biLSTM encoder and a fully-connected classifier to compute similarity. Our experiments demonstrate the superiority of our approach over the state-of-the-art on both open-sourced data and a private JetBrains dataset.
arXiv Detail & Related papers (2021-03-18T21:10:41Z)
CLARA: Clinical Report Auto-completion [56.206459591367405]
CLinicit Al it Report it Auto-completion (CLARA) is an interactive method that generates reports in a sentence by sentence fashion based on doctors' anchor words and partially completed sentences. In our experimental evaluation, CLARA achieved 0.393 CIDEr and 0.248 BLEU-4 on X-ray reports and 0.482 CIDEr and 0.491 BLEU-4 for EEG reports for sentence-level generation.
arXiv Detail & Related papers (2020-02-26T18:45:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.