Evaluating Pre-Trained Models for User Feedback Analysis in Software
Engineering: A Study on Classification of App-Reviews
- URL: http://arxiv.org/abs/2104.05861v1
- Date: Mon, 12 Apr 2021 23:23:45 GMT
- Title: Evaluating Pre-Trained Models for User Feedback Analysis in Software
Engineering: A Study on Classification of App-Reviews
- Authors: Mohammad Abdul Hadi, Fatemeh H. Fard
- Abstract summary: We study the accuracy and time efficiency of pre-trained neural language models (PTMs) for app review classification.
We set up different studies to evaluate PTMs in multiple settings.
In all cases, Micro and Macro Precision, Recall, and F1-scores will be used.
- Score: 2.66512000865131
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Context: Mobile app reviews written by users on app stores or social media
are significant resources for app developers.Analyzing app reviews have proved
to be useful for many areas of software engineering (e.g., requirement
engineering, testing). Automatic classification of app reviews requires
extensive efforts to manually curate a labeled dataset. When the classification
purpose changes (e.g. identifying bugs versus usability issues or sentiment),
new datasets should be labeled, which prevents the extensibility of the
developed models for new desired classes/tasks in practice. Recent pre-trained
neural language models (PTM) are trained on large corpora in an unsupervised
manner and have found success in solving similar Natural Language Processing
problems. However, the applicability of PTMs is not explored for app review
classification Objective: We investigate the benefits of PTMs for app review
classification compared to the existing models, as well as the transferability
of PTMs in multiple settings. Method: We empirically study the accuracy and
time efficiency of PTMs compared to prior approaches using six datasets from
literature. In addition, we investigate the performance of the PTMs trained on
app reviews (i.e. domain-specific PTMs) . We set up different studies to
evaluate PTMs in multiple settings: binary vs. multi-class classification,
zero-shot classification (when new labels are introduced to the model),
multi-task setting, and classification of reviews from different resources. The
datasets are manually labeled app review datasets from Google Play Store, Apple
App Store, and Twitter data. In all cases, Micro and Macro Precision, Recall,
and F1-scores will be used and we will report the time required for training
and prediction with the models.
Related papers
- Towards a Classification of Open-Source ML Models and Datasets for Software Engineering [52.257764273141184]
Open-source Pre-Trained Models (PTMs) and datasets provide extensive resources for various Machine Learning (ML) tasks.
These resources lack a classification tailored to Software Engineering (SE) needs.
We apply an SE-oriented classification to PTMs and datasets on a popular open-source ML repository, Hugging Face (HF), and analyze the evolution of PTMs over time.
arXiv Detail & Related papers (2024-11-14T18:52:05Z) - Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored.
We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches.
We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z) - Automated categorization of pre-trained models for software engineering: A case study with a Hugging Face dataset [9.218130273952383]
Software engineering activities have been revolutionized by the advent of pre-trained models (PTMs)
The Hugging Face (HF) platform simplifies the use of PTMs by collecting, storing, and curating several models.
This paper introduces an approach to enable the automatic classification of PTMs for SE tasks.
arXiv Detail & Related papers (2024-05-21T20:26:17Z) - Continual Learning with Pre-Trained Models: A Survey [61.97613090666247]
Continual Learning aims to overcome the catastrophic forgetting of former knowledge when learning new ones.
This paper presents a comprehensive survey of the latest advancements in PTM-based CL.
arXiv Detail & Related papers (2024-01-29T18:27:52Z) - T-FREX: A Transformer-based Feature Extraction Method from Mobile App
Reviews [5.235401361674881]
We present T-FREX, a Transformer-based, fully automatic approach for mobile app review feature extraction.
First, we collect a set of ground truth features from users in a real crowdsourced software recommendation platform.
Then, we use this newly created dataset to fine-tune multiple LLMs on a named entity recognition task.
arXiv Detail & Related papers (2024-01-08T11:43:03Z) - Can GitHub Issues Help in App Review Classifications? [0.7366405857677226]
We propose a novel approach that assists in augmenting labeled datasets by utilizing information extracted from GitHub issues.
Our results demonstrate that using labeled issues for data augmentation can improve the F1-score to 6.3 in bug reports and 7.2 in feature requests.
arXiv Detail & Related papers (2023-08-27T22:01:24Z) - Revisiting Class-Incremental Learning with Pre-Trained Models: Generalizability and Adaptivity are All You Need [84.3507610522086]
Class-incremental learning (CIL) aims to adapt to emerging new classes without forgetting old ones.
Recent pre-training has achieved substantial progress, making vast pre-trained models (PTMs) accessible for CIL.
We argue that the core factors in CIL are adaptivity for model updating and generalizability for knowledge transferring.
arXiv Detail & Related papers (2023-03-13T17:59:02Z) - Pre-Trained Neural Language Models for Automatic Mobile App User
Feedback Answer Generation [9.105367401167129]
Studies show that developers' answers to the mobile app users' feedbacks on app stores can increase the apps' star rating.
To help app developers generate answers that are related to the users' issues, recent studies develop models to generate the answers automatically.
In this paper, we evaluate pre-trained neural language models (PTMs) to generate replies to the mobile app user feedbacks.
arXiv Detail & Related papers (2022-02-04T18:26:55Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.