Related papers: Exploring Requirements Elicitation from App Store User Reviews Using Large Language Models

Exploring Requirements Elicitation from App Store User Reviews Using Large Language Models

URL: http://arxiv.org/abs/2409.15473v1
Date: Mon, 23 Sep 2024 18:57:31 GMT
Title: Exploring Requirements Elicitation from App Store User Reviews Using Large Language Models
Authors: Tanmai Kumar Ghosh, Atharva Pargaonkar, Nasir U. Eisty,
Abstract summary: This research introduces an approach leveraging the power of Large Language Models to analyze user reviews for automated requirements elicitation. We fine-tuned three well-established LLMs BERT, DistilBERT, and GEMMA, on a dataset of app reviews labeled for usefulness. Our evaluation revealed BERT's superior performance, achieving an accuracy of 92.40% and an F1-score of 92.39%, demonstrating its effectiveness in accurately classifying useful reviews.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Mobile applications have become indispensable companions in our daily lives. Spanning over the categories from communication and entertainment to healthcare and finance, these applications have been influential in every aspect. Despite their omnipresence, developing apps that meet user needs and expectations still remains a challenge. Traditional requirements elicitation methods like user interviews can be time-consuming and suffer from limited scope and subjectivity. This research introduces an approach leveraging the power of Large Language Models (LLMs) to analyze user reviews for automated requirements elicitation. We fine-tuned three well-established LLMs BERT, DistilBERT, and GEMMA, on a dataset of app reviews labeled for usefulness. Our evaluation revealed BERT's superior performance, achieving an accuracy of 92.40% and an F1-score of 92.39%, demonstrating its effectiveness in accurately classifying useful reviews. While GEMMA displayed a lower overall performance, it excelled in recall (93.39%), indicating its potential for capturing a comprehensive set of valuable user insights. These findings suggest that LLMs offer a promising avenue for streamlining requirements elicitation in mobile app development, leading to the creation of more user-centric and successful applications.

Related papers

CMER: A Context-Aware Approach for Mining Ethical Concern-related App Reviews [0.0]
This study proposes CMER (A underlineContext-Aware Approach for underlineEthical Concern-related App underlineReviews) to extract ethical concern-related app reviews at scale.<n>CMER combines Natural Language Inference (NLI) and a decoder-only (LLaMA-like) Large Language Model (LLM)<n>We evaluated the validity of CMER by mining privacy and security-related reviews (PSRs) from the dataset of more than 382K app reviews of mobile investment apps.
arXiv Detail & Related papers (2025-07-11T21:46:04Z)
Leveraging Large Language Models for Classifying App Users' Feedback [0.7366405857677226]
We evaluate the capabilities of four advanced LLMs, including GPT-3.5-Turbo, GPT-4, Flan-T5, and Llama3-70b, to enhance user feedback classification.<n>Our findings indicate that LLMs when guided by well-crafted prompts, can effectively classify user feedback into coarse-grained categories.
arXiv Detail & Related papers (2025-07-11T01:33:54Z)
Mobile Application Review Summarization using Chain of Density Prompting [1.90298817989995]
We leverage Large Language Models (LLMs) to summarize mobile app reviews.<n>We use the Chain of Density (CoD) prompt to guide OpenAI GPT-4 to generate abstractive, semantically dense, and easily interpretable summaries.
arXiv Detail & Related papers (2025-06-17T05:17:21Z)
Beyond Keywords: A Context-based Hybrid Approach to Mining Ethical Concern-related App Reviews [0.0]
App reviews related to ethical concerns generally use domain-specific language and are expressed using a more varied vocabulary. This study proposes a novel Natural Language Processing (NLI) based approach that combines Natural Language Inference (NLI) and a decoder-only (LLaMA-like) Large Language Model (LLM) to extract ethical concern-related app reviews at scale.
arXiv Detail & Related papers (2024-11-11T22:08:48Z)
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models [71.36392373876505]
We introduce MMIE, a large-scale benchmark for evaluating interleaved multimodal comprehension and generation in Large Vision-Language Models (LVLMs) MMIE comprises 20K meticulously curated multimodal queries, spanning 3 categories, 12 fields, and 102 subfields, including mathematics, coding, physics, literature, health, and arts. It supports both interleaved inputs and outputs, offering a mix of multiple-choice and open-ended question formats to evaluate diverse competencies.
arXiv Detail & Related papers (2024-10-14T04:15:00Z)
LLM-Cure: LLM-based Competitor User Review Analysis for Feature Enhancement [0.7285835869818668]
We propose a large language model (LLM)-based Competitive User Review Analysis for Feature Enhancement. LLM-Cure identifies and categorizes features within reviews by applying LLMs. When provided with a complaint in a user review, LLM-Cure curates highly rated (4 and 5 stars) reviews in competing apps related to the complaint.
arXiv Detail & Related papers (2024-09-24T04:17:21Z)
LEARN: Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application [54.984348122105516]
Llm-driven knowlEdge Adaptive RecommeNdation (LEARN) framework synergizes open-world knowledge with collaborative knowledge. We propose an Llm-driven knowlEdge Adaptive RecommeNdation (LEARN) framework that synergizes open-world knowledge with collaborative knowledge.
arXiv Detail & Related papers (2024-05-07T04:00:30Z)
Elicitron: An LLM Agent-Based Simulation Framework for Design Requirements Elicitation [38.98478510165569]
This paper introduces a novel framework that leverages Large Language Models (LLMs) to automate and enhance the requirements elicitation process. LLMs are used to generate a vast array of simulated users (LLM agents), enabling the exploration of a much broader range of user needs.
arXiv Detail & Related papers (2024-04-04T17:36:29Z)
Fairness Concerns in App Reviews: A Study on AI-based Mobile Apps [9.948068408730654]
This research aims to investigate fairness concerns raised in mobile app reviews. Our research focuses on AI-based mobile app reviews as the chance of unfair behaviors and outcomes in AI-based apps may be higher than in non-AI-based apps.
arXiv Detail & Related papers (2024-01-16T03:43:33Z)
GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language Models [83.30078426829627]
Large language models (LLMs) have gained popularity and are being widely adopted by a large user community. The existing evaluation methods have many constraints, and their results exhibit a limited degree of interpretability. We propose a bias evaluation framework named GPTBIAS that leverages the high performance of LLMs to assess bias in models.
arXiv Detail & Related papers (2023-12-11T12:02:14Z)
Large Language Models as Automated Aligners for benchmarking Vision-Language Models [48.4367174400306]
Vision-Language Models (VLMs) have reached a new level of sophistication, showing notable competence in executing intricate cognition and reasoning tasks. Existing evaluation benchmarks, primarily relying on rigid, hand-crafted datasets, face significant limitations in assessing the alignment of these increasingly anthropomorphic models with human intelligence. In this work, we address the limitations via Auto-Bench, which delves into exploring LLMs as proficient curation, measuring the alignment betweenVLMs and human intelligence and value through automatic data curation and assessment.
arXiv Detail & Related papers (2023-11-24T16:12:05Z)
MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria [49.500322937449326]
Multimodal large language models (MLLMs) have broadened the scope of AI applications. Existing automatic evaluation methodologies for MLLMs are mainly limited in evaluating queries without considering user experiences. We propose a new evaluation paradigm for MLLMs, which is evaluating MLLMs with per-sample criteria using potent MLLM as the judge.
arXiv Detail & Related papers (2023-11-23T12:04:25Z)
Rethinking the Evaluation for Conversational Recommendation in the Era of Large Language Models [115.7508325840751]
The recent success of large language models (LLMs) has shown great potential to develop more powerful conversational recommender systems (CRSs) In this paper, we embark on an investigation into the utilization of ChatGPT for conversational recommendation, revealing the inadequacy of the existing evaluation protocol. We propose an interactive Evaluation approach based on LLMs named iEvaLM that harnesses LLM-based user simulators.
arXiv Detail & Related papers (2023-05-22T15:12:43Z)
Emerging App Issue Identification via Online Joint Sentiment-Topic Tracing [66.57888248681303]
We propose a novel emerging issue detection approach named MERIT. Based on the AOBST model, we infer the topics negatively reflected in user reviews for one app version. Experiments on popular apps from Google Play and Apple's App Store demonstrate the effectiveness of MERIT.
arXiv Detail & Related papers (2020-08-23T06:34:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.