Mobile Application Review Summarization using Chain of Density Prompting
- URL: http://arxiv.org/abs/2506.14192v1
- Date: Tue, 17 Jun 2025 05:17:21 GMT
- Title: Mobile Application Review Summarization using Chain of Density Prompting
- Authors: Shristi Shrestha, Anas Mahmoud,
- Abstract summary: We leverage Large Language Models (LLMs) to summarize mobile app reviews.<n>We use the Chain of Density (CoD) prompt to guide OpenAI GPT-4 to generate abstractive, semantically dense, and easily interpretable summaries.
- Score: 1.90298817989995
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Mobile app users commonly rely on app store ratings and reviews to find apps that suit their needs. However, the sheer volume of reviews available on app stores can lead to information overload, thus impeding users' ability to make informed app selection decisions. To address this challenge, we leverage Large Language Models (LLMs) to summarize mobile app reviews. In particular, we use the Chain of Density (CoD) prompt to guide OpenAI GPT-4 to generate abstractive, semantically dense, and easily interpretable summaries of mobile app reviews. The CoD prompt is engineered to iteratively extract salient entities from the source text and fuse them into a fixed-length summary. We evaluate the performance of our approach using a large dataset of mobile app reviews. We further conduct an empirical evaluation with 48 study participants to assess the readability of the generated summaries. Our results demonstrate that adapting the CoD prompt to focus on app features improves its ability to extract key themes from user reviews and generate natural language summaries tailored for end-user consumption. The prompt also manages to maintain the readability of the generated summaries while increasing their semantic density. Our work in this paper aims to improve mobile app users' experience by providing an effective mechanism for summarizing important user feedback in the review stream.
Related papers
- LazyReview A Dataset for Uncovering Lazy Thinking in NLP Peer Reviews [74.87393214734114]
This work introduces LazyReview, a dataset of peer-review sentences annotated with fine-grained lazy thinking categories.<n>Large Language Models (LLMs) struggle to detect these instances in a zero-shot setting.<n> instruction-based fine-tuning on our dataset significantly boosts performance by 10-20 performance points.
arXiv Detail & Related papers (2025-04-15T10:07:33Z) - From Voice to Value: Leveraging AI to Enhance Spoken Online Reviews on the Go [21.811104609265158]
We developed Vocalizer, a mobile application that enables users to provide reviews through voice input.<n>Our findings show that users frequently utilized the AI agent to add more detailed information to their reviews.<n>We also show how interactive AI features can improve users self-efficacy and willingness to share reviews online.
arXiv Detail & Related papers (2024-12-06T21:59:47Z) - Exploring Requirements Elicitation from App Store User Reviews Using Large Language Models [0.0]
This research introduces an approach leveraging the power of Large Language Models to analyze user reviews for automated requirements elicitation.
We fine-tuned three well-established LLMs BERT, DistilBERT, and GEMMA, on a dataset of app reviews labeled for usefulness.
Our evaluation revealed BERT's superior performance, achieving an accuracy of 92.40% and an F1-score of 92.39%, demonstrating its effectiveness in accurately classifying useful reviews.
arXiv Detail & Related papers (2024-09-23T18:57:31Z) - Towards Enhancing Coherence in Extractive Summarization: Dataset and Experiments with LLMs [70.15262704746378]
We propose a systematically created human-annotated dataset consisting of coherent summaries for five publicly available datasets and natural language user feedback.
Preliminary experiments with Falcon-40B and Llama-2-13B show significant performance improvements (10% Rouge-L) in terms of producing coherent summaries.
arXiv Detail & Related papers (2024-07-05T20:25:04Z) - Zero-shot Bilingual App Reviews Mining with Large Language Models [0.7340017786387767]
Mini-BAR is a tool that integrates large language models (LLMs) to perform zero-shot mining of user reviews in both English and French.
To evaluate the performance of Mini-BAR, we created a dataset containing 6,000 English and 6,000 French annotated user reviews.
arXiv Detail & Related papers (2023-11-06T12:36:46Z) - UltraFeedback: Boosting Language Models with Scaled AI Feedback [99.4633351133207]
We present textscUltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset.
Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models.
arXiv Detail & Related papers (2023-10-02T17:40:01Z) - Proactive Prioritization of App Issues via Contrastive Learning [2.6763498831034043]
We propose a new framework, PPrior, that enables proactive prioritization of app issues through identifying prominent reviews.
PPrior employs a pre-trained T5 model and works in three phases.
Phase one adapts the pre-trained T5 model to the user reviews data in a self-supervised fashion.
Phase two, we leverage contrastive training to learn a generic and task-independent representation of user reviews.
arXiv Detail & Related papers (2023-03-12T06:23:10Z) - Prompted Opinion Summarization with GPT-3.5 [115.95460650578678]
We show that GPT-3.5 models achieve very strong performance in human evaluation.
We argue that standard evaluation metrics do not reflect this, and introduce three new metrics targeting faithfulness, factuality, and genericity.
arXiv Detail & Related papers (2022-11-29T04:06:21Z) - Emerging App Issue Identification via Online Joint Sentiment-Topic
Tracing [66.57888248681303]
We propose a novel emerging issue detection approach named MERIT.
Based on the AOBST model, we infer the topics negatively reflected in user reviews for one app version.
Experiments on popular apps from Google Play and Apple's App Store demonstrate the effectiveness of MERIT.
arXiv Detail & Related papers (2020-08-23T06:34:05Z) - Automating App Review Response Generation [67.58267006314415]
We propose a novel approach RRGen that automatically generates review responses by learning knowledge relations between reviews and their responses.
Experiments on 58 apps and 309,246 review-response pairs highlight that RRGen outperforms the baselines by at least 67.4% in terms of BLEU-4.
arXiv Detail & Related papers (2020-02-10T05:23:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.