Related papers: Beyond Stars: Bridging the Gap Between Ratings and Review Sentiment with LLM

Beyond Stars: Bridging the Gap Between Ratings and Review Sentiment with LLM

URL: http://arxiv.org/abs/2509.20953v1
Date: Thu, 25 Sep 2025 09:39:12 GMT
Title: Beyond Stars: Bridging the Gap Between Ratings and Review Sentiment with LLM
Authors: Najla Zuhir, Amna Mohammad Salim, Parvathy Premkumar, Moshiur Farazi,
Abstract summary: We present an advanced approach to mobile app review analysis aimed at addressing limitations inherent in traditional star-rating systems.<n>We propose a modular framework leveraging large language models (LLMs) enhanced by structured prompting techniques.<n>Our method quantifies discrepancies between numerical ratings and textual sentiment, extracts detailed, feature-level insights, and supports interactive exploration of reviews.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present an advanced approach to mobile app review analysis aimed at addressing limitations inherent in traditional star-rating systems. Star ratings, although intuitive and popular among users, often fail to capture the nuanced feedback present in detailed review texts. Traditional NLP techniques -- such as lexicon-based methods and classical machine learning classifiers -- struggle to interpret contextual nuances, domain-specific terminology, and subtle linguistic features like sarcasm. To overcome these limitations, we propose a modular framework leveraging large language models (LLMs) enhanced by structured prompting techniques. Our method quantifies discrepancies between numerical ratings and textual sentiment, extracts detailed, feature-level insights, and supports interactive exploration of reviews through retrieval-augmented conversational question answering (RAG-QA). Comprehensive experiments conducted on three diverse datasets (AWARE, Google Play, and Spotify) demonstrate that our LLM-driven approach significantly surpasses baseline methods, yielding improved accuracy, robustness, and actionable insights in challenging and context-rich review scenarios.

Related papers

Learning Contextual Retrieval for Robust Conversational Search [34.74877456870482]
ContextualRetriever is a novel LLM-based retriever that directly incorporates conversational context into the retrieval process.<n>Our approach introduces: (1) a context-aware embedding mechanism that highlights the current query within the dialogue history; (2) intent-guided supervision based on high-quality rewritten queries; and (3) a training strategy that preserves the generative capabilities of the base LLM.
arXiv Detail & Related papers (2025-09-24T02:17:37Z)
Readability Formulas, Systems and LLMs are Poor Predictors of Reading Ease [4.868319717279586]
We focus on a fundamental and understudied aspect of readability, real-time reading ease, captured with online reading measures using eye tracking.<n>Applying this evaluation to prominent traditional readability formulas, modern machine learning systems and commercial systems used in education, suggests that they are all poor predictors of reading ease in English.
arXiv Detail & Related papers (2025-02-16T14:51:44Z)
Can LLMs assist with Ambiguity? A Quantitative Evaluation of various Large Language Models on Word Sense Disambiguation [4.1164810106780925]
This study investigates the use of Large Language Models (LLMs) to improve Word Sense Disambiguation (WSD)<n>The proposed method incorporates a human-in-loop approach for prompt augmentation where prompt is supported by Part-of-Speech (POS) tagging, synonyms of ambiguous words, aspect-based sense filtering and few-shot prompting.<n>By utilizing a few-shot Chain of Thought (COT) prompting-based approach, this work demonstrates a substantial improvement in performance.
arXiv Detail & Related papers (2024-11-27T13:35:32Z)
Beyond Coarse-Grained Matching in Video-Text Retrieval [50.799697216533914]
We introduce a new approach for fine-grained evaluation. Our approach can be applied to existing datasets by automatically generating hard negative test captions. Experiments on our fine-grained evaluations demonstrate that this approach enhances a model's ability to understand fine-grained differences.
arXiv Detail & Related papers (2024-10-16T09:42:29Z)
A Controlled Study on Long Context Extension and Generalization in LLMs [85.4758128256142]
Broad textual understanding and in-context learning require language models that utilize full document contexts. Due to the implementation challenges associated with directly training long-context models, many methods have been proposed for extending models to handle long contexts. We implement a controlled protocol for extension methods with a standardized evaluation, utilizing consistent base models and extension data.
arXiv Detail & Related papers (2024-09-18T17:53:17Z)
Retrieval Meets Reasoning: Dynamic In-Context Editing for Long-Text Understanding [11.5386284281652]
We introduce a novel approach that re-imagines information retrieval through dynamic in-context editing. By treating lengthy contexts as malleable external knowledge, our method interactively gathers and integrates relevant information. Experimental results demonstrate that our method effectively empowers context-limited LLMs to engage in multi-hop reasoning with improved performance.
arXiv Detail & Related papers (2024-06-18T06:54:28Z)
SLIDE: A Framework Integrating Small and Large Language Models for Open-Domain Dialogues Evaluation [23.203761925540736]
We propose a novel framework SLIDE (Small and Large Integrated for Dialogue Evaluation) Our approach achieves state-of-the-art performance in both the classification and evaluation tasks, and additionally the SLIDE exhibits better correlation with human evaluators.
arXiv Detail & Related papers (2024-05-24T20:32:49Z)
Leveraging Large Language Models for NLG Evaluation: Advances and Challenges [57.88520765782177]
Large Language Models (LLMs) have opened new avenues for assessing generated content quality, e.g., coherence, creativity, and context relevance. We propose a coherent taxonomy for organizing existing LLM-based evaluation metrics, offering a structured framework to understand and compare these methods. By discussing unresolved challenges, including bias, robustness, domain-specificity, and unified evaluation, this paper seeks to offer insights to researchers and advocate for fairer and more advanced NLG evaluation techniques.
arXiv Detail & Related papers (2024-01-13T15:59:09Z)
Re-Reading Improves Reasoning in Large Language Models [87.46256176508376]
We introduce a simple, yet general and effective prompting method, Re2, to enhance the reasoning capabilities of off-the-shelf Large Language Models (LLMs) Unlike most thought-eliciting prompting methods, such as Chain-of-Thought (CoT), Re2 shifts the focus to the input by processing questions twice, thereby enhancing the understanding process. We evaluate Re2 on extensive reasoning benchmarks across 14 datasets, spanning 112 experiments, to validate its effectiveness and generality.
arXiv Detail & Related papers (2023-09-12T14:36:23Z)
Rethinking the Evaluation for Conversational Recommendation in the Era of Large Language Models [115.7508325840751]
The recent success of large language models (LLMs) has shown great potential to develop more powerful conversational recommender systems (CRSs) In this paper, we embark on an investigation into the utilization of ChatGPT for conversational recommendation, revealing the inadequacy of the existing evaluation protocol. We propose an interactive Evaluation approach based on LLMs named iEvaLM that harnesses LLM-based user simulators.
arXiv Detail & Related papers (2023-05-22T15:12:43Z)
Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue Questions with LLMs [59.74002011562726]
We propose a novel linguistic cue-based chain-of-thoughts (textitCue-CoT) to provide a more personalized and engaging response. We build a benchmark with in-depth dialogue questions, consisting of 6 datasets in both Chinese and English. Empirical results demonstrate our proposed textitCue-CoT method outperforms standard prompting methods in terms of both textithelpfulness and textitacceptability on all datasets.
arXiv Detail & Related papers (2023-05-19T16:27:43Z)
Interactive Fiction Game Playing as Multi-Paragraph Reading Comprehension with Reinforcement Learning [94.50608198582636]
Interactive Fiction (IF) games with real human-written natural language texts provide a new natural evaluation for language understanding techniques. We take a novel perspective of IF game solving and re-formulate it as Multi-Passage Reading (MPRC) tasks.
arXiv Detail & Related papers (2020-10-05T23:09:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.