Do Reviews Matter for Recommendations in the Era of Large Language Models?
- URL: http://arxiv.org/abs/2512.12978v1
- Date: Mon, 15 Dec 2025 04:46:48 GMT
- Title: Do Reviews Matter for Recommendations in the Era of Large Language Models?
- Authors: Chee Heng Tan, Huiying Zheng, Jing Wang, Zhuoyi Lin, Shaodi Feng, Huijing Zhan, Xiaoli Li, J. Senthilnath,
- Abstract summary: With the advent of large language models (LLMs), the landscape of recommender systems is undergoing a significant transformation.<n>Traditionally, user reviews have served as a critical source of rich, contextual information for enhancing recommendation quality.<n>This paper provides a systematic investigation of the evolving role of text reviews in recommendation by comparing deep learning methods and LLM approaches.
- Score: 8.772803183525284
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the advent of large language models (LLMs), the landscape of recommender systems is undergoing a significant transformation. Traditionally, user reviews have served as a critical source of rich, contextual information for enhancing recommendation quality. However, as LLMs demonstrate an unprecedented ability to understand and generate human-like text, this raises the question of whether explicit user reviews remain essential in the era of LLMs. In this paper, we provide a systematic investigation of the evolving role of text reviews in recommendation by comparing deep learning methods and LLM approaches. Particularly, we conduct extensive experiments on eight public datasets with LLMs and evaluate their performance in zero-shot, few-shot, and fine-tuning scenarios. We further introduce a benchmarking evaluation framework for review-aware recommender systems, RAREval, to comprehensively assess the contribution of textual reviews to the recommendation performance of review-aware recommender systems. Our framework examines various scenarios, including the removal of some or all textual reviews, random distortion, as well as recommendation performance in data sparsity and cold-start user settings. Our findings demonstrate that LLMs are capable of functioning as effective review-aware recommendation engines, generally outperforming traditional deep learning approaches, particularly in scenarios characterized by data sparsity and cold-start conditions. In addition, the removal of some or all textual reviews and random distortion does not necessarily lead to declines in recommendation accuracy. These findings motivate a rethinking of how user preference from text reviews can be more effectively leveraged. All code and supplementary materials are available at: https://github.com/zhytk/RAREval-data-processing.
Related papers
- Enhancing Rating Prediction with Off-the-Shelf LLMs Using In-Context User Reviews [16.394933051332657]
Likert-scale rating prediction is a regression task that requires both language and mathematical reasoning to be solved effectively.<n>This study investigates the performance of off-the-shelf LLMs on rating prediction, providing different in-context information.<n>We find that user-written reviews significantly improve the rating prediction performance of LLMs.
arXiv Detail & Related papers (2025-10-01T03:04:20Z) - Learning to Shop Like Humans: A Review-driven Retrieval-Augmented Recommendation Framework with LLMs [30.748667156183004]
RevBrowse is a review-driven recommendation framework inspired by the "browse-then-decide" decision process.<n>RevBrowse integrates user reviews into the LLM-based reranking process to enhance its ability to distinguish between candidate items.<n>PrefRAG is a retrieval-augmented module that disentangles user and item representations into structured forms.
arXiv Detail & Related papers (2025-08-31T04:37:43Z) - AI-Driven Review Systems: Evaluating LLMs in Scalable and Bias-Aware Academic Reviews [18.50142644126276]
We evaluate the alignment of automatic paper reviews with human reviews using an arena of human preferences by pairwise comparisons.
We fine-tune an LLM to predict human preferences, predicting which reviews humans will prefer in a head-to-head battle between LLMs.
We make the reviews of publicly available arXiv and open-access Nature journal papers available online, along with a free service which helps authors review and revise their research papers and improve their quality.
arXiv Detail & Related papers (2024-08-19T19:10:38Z) - CherryRec: Enhancing News Recommendation Quality via LLM-driven Framework [4.4206696279087]
We propose a framework for news recommendation using Large Language Models (LLMs) named textitCherryRec.
CherryRec ensures the quality of recommendations while accelerating the recommendation process.
We validate the effectiveness of the proposed framework by comparing it with state-of-the-art baseline methods on benchmark datasets.
arXiv Detail & Related papers (2024-06-18T03:33:38Z) - Large Language Models as Evaluators for Recommendation Explanations [23.938202791437337]
We investigate whether LLMs can serve as evaluators of recommendation explanations.
We design and apply a 3-level meta evaluation strategy to measure the correlation between evaluator labels and the ground truth provided by users.
Our study verifies that utilizing LLMs as evaluators can be an accurate, reproducible and cost-effective solution for evaluating recommendation explanation texts.
arXiv Detail & Related papers (2024-06-05T13:23:23Z) - Tapping the Potential of Large Language Models as Recommender Systems: A Comprehensive Framework and Empirical Analysis [91.5632751731927]
Large Language Models such as ChatGPT have showcased remarkable abilities in solving general tasks.<n>We propose a general framework for utilizing LLMs in recommendation tasks, focusing on the capabilities of LLMs as recommenders.<n>We analyze the impact of public availability, tuning strategies, model architecture, parameter scale, and context length on recommendation results.
arXiv Detail & Related papers (2024-01-10T08:28:56Z) - Unlocking the Potential of Large Language Models for Explainable
Recommendations [55.29843710657637]
It remains uncertain what impact replacing the explanation generator with the recently emerging large language models (LLMs) would have.
In this study, we propose LLMXRec, a simple yet effective two-stage explainable recommendation framework.
By adopting several key fine-tuning techniques, controllable and fluent explanations can be well generated.
arXiv Detail & Related papers (2023-12-25T09:09:54Z) - GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language
Models [83.30078426829627]
Large language models (LLMs) have gained popularity and are being widely adopted by a large user community.
The existing evaluation methods have many constraints, and their results exhibit a limited degree of interpretability.
We propose a bias evaluation framework named GPTBIAS that leverages the high performance of LLMs to assess bias in models.
arXiv Detail & Related papers (2023-12-11T12:02:14Z) - LLM-Rec: Personalized Recommendation via Prompting Large Language Models [62.481065357472964]
Large language models (LLMs) have showcased their ability to harness commonsense knowledge and reasoning.
Recent advances in large language models (LLMs) have showcased their remarkable ability to harness commonsense knowledge and reasoning.
This study introduces a novel approach, coined LLM-Rec, which incorporates four distinct prompting strategies of text enrichment for improving personalized text-based recommendations.
arXiv Detail & Related papers (2023-07-24T18:47:38Z) - A Survey on Large Language Models for Recommendation [77.91673633328148]
Large Language Models (LLMs) have emerged as powerful tools in the field of Natural Language Processing (NLP)
This survey presents a taxonomy that categorizes these models into two major paradigms, respectively Discriminative LLM for Recommendation (DLLM4Rec) and Generative LLM for Recommendation (GLLM4Rec)
arXiv Detail & Related papers (2023-05-31T13:51:26Z) - Is ChatGPT Fair for Recommendation? Evaluating Fairness in Large
Language Model Recommendation [52.62492168507781]
We propose a novel benchmark called Fairness of Recommendation via LLM (FaiRLLM)
This benchmark comprises carefully crafted metrics and a dataset that accounts for eight sensitive attributes.
By utilizing our FaiRLLM benchmark, we conducted an evaluation of ChatGPT and discovered that it still exhibits unfairness to some sensitive attributes when generating recommendations.
arXiv Detail & Related papers (2023-05-12T16:54:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.