Can GitHub Issues Help in App Review Classifications?
- URL: http://arxiv.org/abs/2308.14211v3
- Date: Wed, 24 Jul 2024 04:04:26 GMT
- Title: Can GitHub Issues Help in App Review Classifications?
- Authors: Yasaman Abedini, Abbas Heydarnoori,
- Abstract summary: We propose a novel approach that assists in augmenting labeled datasets by utilizing information extracted from GitHub issues.
Our results demonstrate that using labeled issues for data augmentation can improve the F1-score to 6.3 in bug reports and 7.2 in feature requests.
- Score: 0.7366405857677226
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: App reviews reflect various user requirements that can aid in planning maintenance tasks. Recently, proposed approaches for automatically classifying user reviews rely on machine learning algorithms. A previous study demonstrated that models trained on existing labeled datasets exhibit poor performance when predicting new ones. Therefore, a comprehensive labeled dataset is essential to train a more precise model. In this paper, we propose a novel approach that assists in augmenting labeled datasets by utilizing information extracted from an additional source, GitHub issues, that contains valuable information about user requirements. First, we identify issues concerning review intentions (bug reports, feature requests, and others) by examining the issue labels. Then, we analyze issue bodies and define 19 language patterns for extracting targeted information. Finally, we augment the manually labeled review dataset with a subset of processed issues through the Within-App, Within-Context, and Between-App Analysis methods. We conducted several experiments to evaluate the proposed approach. Our results demonstrate that using labeled issues for data augmentation can improve the F1-score to 6.3 in bug reports and 7.2 in feature requests. Furthermore, we identify an effective range of 0.3 to 0.7 for the auxiliary volume, which provides better performance improvements.
Related papers
- Improving embedding with contrastive fine-tuning on small datasets with expert-augmented scores [12.86467344792873]
The proposed method uses soft labels derived from expert-augmented scores to fine-tune embedding models.
The paper evaluates the method using a Q&A dataset from an online shopping website and eight expert models.
arXiv Detail & Related papers (2024-08-19T01:59:25Z) - Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction [54.23208041792073]
Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review.
A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods.
We propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels.
arXiv Detail & Related papers (2024-06-26T05:30:21Z) - Automatic Classification of Bug Reports Based on Multiple Text
Information and Reports' Intention [37.67372105858311]
This paper proposes a new automatic classification method for bug reports.
The innovation is that when categorizing bug reports, in addition to using the text information of the report, the intention of the report is also considered.
Our proposed method achieves better performance and its F-Measure achieves from 87.3% to 95.5%.
arXiv Detail & Related papers (2022-08-02T06:44:51Z) - Towards a Data-Driven Requirements Engineering Approach: Automatic
Analysis of User Reviews [0.440401067183266]
We provide an automated analysis using CamemBERT, which is a state-of-the-art language model in French.
We created a multi-label classification dataset of 6000 user reviews from three applications in the Health & Fitness field.
The results are encouraging and suggest that it's possible to identify automatically the reviews concerning requests for new features.
arXiv Detail & Related papers (2022-06-29T14:14:54Z) - Annotation Error Detection: Analyzing the Past and Present for a More
Coherent Future [63.99570204416711]
We reimplement 18 methods for detecting potential annotation errors and evaluate them on 9 English datasets.
We define a uniform evaluation setup including a new formalization of the annotation error detection task.
We release our datasets and implementations in an easy-to-use and open source software package.
arXiv Detail & Related papers (2022-06-05T22:31:45Z) - Efficient Few-Shot Fine-Tuning for Opinion Summarization [83.76460801568092]
Abstractive summarization models are typically pre-trained on large amounts of generic texts, then fine-tuned on tens or hundreds of thousands of annotated samples.
We show that a few-shot method based on adapters can easily store in-domain knowledge.
We show that this self-supervised adapter pre-training improves summary quality over standard fine-tuning by 2.0 and 1.3 ROUGE-L points on the Amazon and Yelp datasets.
arXiv Detail & Related papers (2022-05-04T16:38:37Z) - DapStep: Deep Assignee Prediction for Stack Trace Error rePresentation [61.99379022383108]
We propose new deep learning models to solve the bug triage problem.
The models are based on a bidirectional recurrent neural network with attention and on a convolutional neural network.
To improve the quality of ranking, we propose using additional information from version control system annotations.
arXiv Detail & Related papers (2022-01-14T00:16:57Z) - Towards Good Practices for Efficiently Annotating Large-Scale Image
Classification Datasets [90.61266099147053]
We investigate efficient annotation strategies for collecting multi-class classification labels for a large collection of images.
We propose modifications and best practices aimed at minimizing human labeling effort.
Simulated experiments on a 125k image subset of the ImageNet100 show that it can be annotated to 80% top-1 accuracy with 0.35 annotations per image on average.
arXiv Detail & Related papers (2021-04-26T16:29:32Z) - Evaluating Pre-Trained Models for User Feedback Analysis in Software
Engineering: A Study on Classification of App-Reviews [2.66512000865131]
We study the accuracy and time efficiency of pre-trained neural language models (PTMs) for app review classification.
We set up different studies to evaluate PTMs in multiple settings.
In all cases, Micro and Macro Precision, Recall, and F1-scores will be used.
arXiv Detail & Related papers (2021-04-12T23:23:45Z) - How Useful is Self-Supervised Pretraining for Visual Tasks? [133.1984299177874]
We evaluate various self-supervised algorithms across a comprehensive array of synthetic datasets and downstream tasks.
Our experiments offer insights into how the utility of self-supervision changes as the number of available labels grows.
arXiv Detail & Related papers (2020-03-31T16:03:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.