Related papers: Beyond Means: A Dynamic Framework for Predicting Customer Satisfaction

Beyond Means: A Dynamic Framework for Predicting Customer Satisfaction

URL: http://arxiv.org/abs/2511.14743v1
Date: Tue, 18 Nov 2025 18:43:29 GMT
Title: Beyond Means: A Dynamic Framework for Predicting Customer Satisfaction
Authors: Christof Naumzik, Abdurahman Maarouf, Stefan Feuerriegel, Markus Weinmann,
Abstract summary: We demonstrate the value of using the Gaussian process (GP) framework for rating aggregation.<n>Based on 121,123 Yelp ratings, we compare the predictive power of different rating aggregation methods in predicting future ratings.<n>Our findings have important implications for marketing practitioners and customers.
Score: 29.75950401212671
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Online ratings influence customer decision-making, yet standard aggregation methods, such as the sample mean, fail to adapt to quality changes over time and ignore review heterogeneity (e.g., review sentiment, a review's helpfulness). To address these challenges, we demonstrate the value of using the Gaussian process (GP) framework for rating aggregation. Specifically, we present a tailored GP model that captures the dynamics of ratings over time while additionally accounting for review heterogeneity. Based on 121,123 ratings from Yelp, we compare the predictive power of different rating aggregation methods in predicting future ratings, thereby finding that the GP model is considerably more accurate and reduces the mean absolute error by 10.2% compared to the sample mean. Our findings have important implications for marketing practitioners and customers. By moving beyond means, designers of online reputation systems can display more informative and adaptive aggregated rating scores that are accurate signals of expected customer satisfaction.

Related papers

K-Sort Eval: Efficient Preference Evaluation for Visual Generation via Corrected VLM-as-a-Judge [51.93484138861584]
The rapid development of visual generative models raises the need for more scalable and human-aligned evaluation methods.<n>We propose K-Sort Eval, a reliable and efficient VLM-based evaluation framework that integrates posterior correction and dynamic matching.<n>Experiments show that K-Sort Eval delivers evaluation results consistent with K-Sort Arena, typically requiring fewer than 90 model runs.
arXiv Detail & Related papers (2026-02-10T05:07:46Z)
Fault-Tolerant Evaluation for Sample-Efficient Model Performance Estimators [13.227055178509524]
We propose a fault-tolerant evaluation framework that integrates bias and variance considerations within an adjustable tolerance level.<n>We show that proper calibration of $varepsilon$ ensures reliable evaluation across different variance regimes.<n> Experiments on real-world datasets demonstrate that our framework provides comprehensive and actionable insights into estimator behavior.
arXiv Detail & Related papers (2026-02-06T22:14:46Z)
Reference-Free Rating of LLM Responses via Latent Information [53.463883683503106]
We study the common practice of asking a judge model to assign Likert-scale scores to free-text responses.<n>We then propose and evaluate Latent Judges, which derive scalar ratings from internal model signals.<n>Across a broad suite of pairwise and single-rating benchmarks, latent methods match or surpass standard prompting.
arXiv Detail & Related papers (2025-09-29T12:15:52Z)
Probe-Rewrite-Evaluate: A Workflow for Reliable Benchmarks and Quantifying Evaluation Awareness [6.071703608560761]
Large Language Models (LLMs) often exhibit significant behavioral shifts when they perceive a change from a real-world deployment context to a controlled evaluation setting.<n>This discrepancy poses a critical challenge for AI alignment, as benchmark performance may not accurately reflect a model's true safety and honesty.<n>We introduce a methodology that uses a linear probe to score prompts on a continuous scale from "test-like" to "deploy-like"
arXiv Detail & Related papers (2025-08-30T19:03:14Z)
Mitigating the Participation Bias by Balancing Extreme Ratings [3.5785450878667597]
We consider a robust rating aggregation task under the participation bias.<n>Our goal is to minimize the expected squared loss between the aggregated ratings and the average of all underlying ratings.
arXiv Detail & Related papers (2025-02-06T02:58:46Z)
Analytical and Empirical Study of Herding Effects in Recommendation Systems [72.6693986712978]
We study how to manage product ratings via rating aggregation rules and shortlisted representative reviews. We show that proper recency aware rating aggregation rules can improve the speed of convergence in Amazon and TripAdvisor.
arXiv Detail & Related papers (2024-08-20T14:29:23Z)
Improved Bayes Risk Can Yield Reduced Social Welfare Under Competition [99.7047087527422]
In this work, we demonstrate that competition can fundamentally alter the behavior of machine learning scaling trends. We find many settings where improving data representation quality decreases the overall predictive accuracy across users. At a conceptual level, our work suggests that favorable scaling trends for individual model-providers need not translate to downstream improvements in social welfare.
arXiv Detail & Related papers (2023-06-26T13:06:34Z)
GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models [60.48306899271866]
We present a new framework, called GREAT Score, for global robustness evaluation of adversarial perturbation using generative models. We show high correlation and significantly reduced cost of GREAT Score when compared to the attack-based model ranking on RobustBench. GREAT Score can be used for remote auditing of privacy-sensitive black-box models.
arXiv Detail & Related papers (2023-04-19T14:58:27Z)
Spatio-Temporal Graph Representation Learning for Fraudster Group Detection [50.779498955162644]
Companies may hire fraudster groups to write fake reviews to either demote competitors or promote their own businesses. To detect such groups, a common model is to represent fraudster groups' static networks. We propose to first capitalize on the effectiveness of the HIN-RNN in both reviewers' representation learning.
arXiv Detail & Related papers (2022-01-07T08:01:38Z)
User and Item-aware Estimation of Review Helpfulness [4.640835690336653]
We investigate the role of deviations in the properties of reviews as helpfulness determinants. We propose a novel helpfulness estimation model that extends previous ones. Our model is thus an effective tool to select relevant user feedback for decision-making.
arXiv Detail & Related papers (2020-11-20T15:35:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.