Automatic Classification of Bug Reports Based on Multiple Text
Information and Reports' Intention
- URL: http://arxiv.org/abs/2208.01274v1
- Date: Tue, 2 Aug 2022 06:44:51 GMT
- Title: Automatic Classification of Bug Reports Based on Multiple Text
Information and Reports' Intention
- Authors: Fanqi Meng, Xuesong Wang, Jingdong Wang and Peifang Wang
- Abstract summary: This paper proposes a new automatic classification method for bug reports.
The innovation is that when categorizing bug reports, in addition to using the text information of the report, the intention of the report is also considered.
Our proposed method achieves better performance and its F-Measure achieves from 87.3% to 95.5%.
- Score: 37.67372105858311
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rapid growth of software scale and complexity, a large number of bug
reports are submitted to the bug tracking system. In order to speed up defect
repair, these reports need to be accurately classified so that they can be sent
to the appropriate developers. However, the existing classification methods
only use the text information of the bug report, which leads to their low
performance. To solve the above problems, this paper proposes a new automatic
classification method for bug reports. The innovation is that when categorizing
bug reports, in addition to using the text information of the report, the
intention of the report (i.e. suggestion or explanation) is also considered,
thereby improving the performance of the classification. First, we collect bug
reports from four ecosystems (Apache, Eclipse, Gentoo, Mozilla) and manually
annotate them to construct an experimental data set. Then, we use Natural
Language Processing technology to preprocess the data. On this basis, BERT and
TF-IDF are used to extract the features of the intention and the multiple text
information. Finally, the features are used to train the classifiers. The
experimental result on five classifiers (including K-Nearest Neighbor, Naive
Bayes, Logistic Regression, Support Vector Machine, and Random Forest) show
that our proposed method achieves better performance and its F-Measure achieves
from 87.3% to 95.5%.
Related papers
- SEDAC: A CVAE-Based Data Augmentation Method for Security Bug Report
Identification [0.0]
In the real world, the ratio of security bug reports is severely low.
SEDAC is a new SBR identification method that generates similar bug report vectors.
It outperforms all the baselines in g-measure with improvements of around 14.24%-50.10%.
arXiv Detail & Related papers (2024-01-22T15:53:52Z) - On Using GUI Interaction Data to Improve Text Retrieval-based Bug
Localization [10.717184444794505]
We investigate the hypothesis that, for end user-facing applications, connecting information in a bug report with information from the GUI, can improve upon existing techniques for bug localization.
We source the current largest dataset of fully-localized and reproducible real bugs for Android apps, with corresponding bug reports.
arXiv Detail & Related papers (2023-10-12T07:14:22Z) - A Comparative Study of Text Embedding Models for Semantic Text
Similarity in Bug Reports [0.0]
Retrieving similar bug reports from an existing database can help reduce the time and effort required to resolve bugs.
We explored several embedding models such as TF-IDF (Baseline), FastText, Gensim, BERT, and ADA.
Our study provides insights into the effectiveness of different embedding methods for retrieving similar bug reports and highlights the impact of selecting the appropriate one for this task.
arXiv Detail & Related papers (2023-08-17T21:36:56Z) - Auto-labelling of Bug Report using Natural Language Processing [0.0]
Rule and Query-based solutions recommend a long list of potential similar bug reports with no clear ranking.
In this paper, we have proposed a solution using a combination of NLP techniques.
It uses a custom data transformer, a deep neural network, and a non-generalizing machine learning method to retrieve existing identical bug reports.
arXiv Detail & Related papers (2022-12-13T02:32:42Z) - Using Developer Discussions to Guide Fixing Bugs in Software [51.00904399653609]
We propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for additional information from developers.
We demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.
arXiv Detail & Related papers (2022-11-11T16:37:33Z) - Annotation Error Detection: Analyzing the Past and Present for a More
Coherent Future [63.99570204416711]
We reimplement 18 methods for detecting potential annotation errors and evaluate them on 9 English datasets.
We define a uniform evaluation setup including a new formalization of the annotation error detection task.
We release our datasets and implementations in an easy-to-use and open source software package.
arXiv Detail & Related papers (2022-06-05T22:31:45Z) - Automatic Issue Classifier: A Transfer Learning Framework for
Classifying Issue Reports [0.0]
We use an off-the-shelf neural network called RoBERTa and finetune it to classify the issue reports.
This paper presents our approach to classify the issue reports in a multi-label setting. We use an off-the-shelf neural network called RoBERTa and finetune it to classify the issue reports.
arXiv Detail & Related papers (2022-02-12T21:43:08Z) - DapStep: Deep Assignee Prediction for Stack Trace Error rePresentation [61.99379022383108]
We propose new deep learning models to solve the bug triage problem.
The models are based on a bidirectional recurrent neural network with attention and on a convolutional neural network.
To improve the quality of ranking, we propose using additional information from version control system annotations.
arXiv Detail & Related papers (2022-01-14T00:16:57Z) - S3M: Siamese Stack (Trace) Similarity Measure [55.58269472099399]
We present S3M -- the first approach to computing stack trace similarity based on deep learning.
It is based on a biLSTM encoder and a fully-connected classifier to compute similarity.
Our experiments demonstrate the superiority of our approach over the state-of-the-art on both open-sourced data and a private JetBrains dataset.
arXiv Detail & Related papers (2021-03-18T21:10:41Z) - CLARA: Clinical Report Auto-completion [56.206459591367405]
CLinicit Al it Report it Auto-completion (CLARA) is an interactive method that generates reports in a sentence by sentence fashion based on doctors' anchor words and partially completed sentences.
In our experimental evaluation, CLARA achieved 0.393 CIDEr and 0.248 BLEU-4 on X-ray reports and 0.482 CIDEr and 0.491 BLEU-4 for EEG reports for sentence-level generation.
arXiv Detail & Related papers (2020-02-26T18:45:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.