Related papers: Can Large Language Models Understand Preferences in Personalized Recommendation?

Can Large Language Models Understand Preferences in Personalized Recommendation?

URL: http://arxiv.org/abs/2501.13391v1
Date: Thu, 23 Jan 2025 05:24:18 GMT
Title: Can Large Language Models Understand Preferences in Personalized Recommendation?
Authors: Zhaoxuan Tan, Zinan Zeng, Qingkai Zeng, Zhenyu Wu, Zheyuan Liu, Fengran Mo, Meng Jiang,
Abstract summary: We introduce PerRecBench, disassociating evaluation from user rating bias and item quality.<n>We find that the LLM-based recommendation techniques that are generally good at rating prediction fail to identify users' favored and disfavored items when the user rating bias and item quality are eliminated.<n>Our findings reveal the superiority of pairwise and listwise ranking approaches over pointwise ranking, PerRecBench's low correlation with traditional regression metrics, the importance of user profiles, and the role of pretraining data distributions.
Score: 32.2250928311146
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) excel in various tasks, including personalized recommendations. Existing evaluation methods often focus on rating prediction, relying on regression errors between actual and predicted ratings. However, user rating bias and item quality, two influential factors behind rating scores, can obscure personal preferences in user-item pair data. To address this, we introduce PerRecBench, disassociating the evaluation from these two factors and assessing recommendation techniques on capturing the personal preferences in a grouped ranking manner. We find that the LLM-based recommendation techniques that are generally good at rating prediction fail to identify users' favored and disfavored items when the user rating bias and item quality are eliminated by grouping users. With PerRecBench and 19 LLMs, we find that while larger models generally outperform smaller ones, they still struggle with personalized recommendation. Our findings reveal the superiority of pairwise and listwise ranking approaches over pointwise ranking, PerRecBench's low correlation with traditional regression metrics, the importance of user profiles, and the role of pretraining data distributions. We further explore three supervised fine-tuning strategies, finding that merging weights from single-format training is promising but improving LLMs' understanding of user preferences remains an open research problem. Code and data are available at https://github.com/TamSiuhin/PerRecBench

Related papers

From Rankings to Insights: Evaluation Should Shift Focus from Leaderboard to Feedback [36.68929551237421]
We introduce bftextFeedbacker, an evaluation framework that provides comprehensive and fine-grained results.<n>Our project homepage and dataset are available at https://liudan193.io/Feedbacker.
arXiv Detail & Related papers (2025-05-10T16:52:40Z)
Prompt-Based LLMs for Position Bias-Aware Reranking in Personalized Recommendations [0.0]
Large language models (LLMs) have been adopted for prompt-based recommendation.<n>LLMs face limitations such as limited context window size, inefficient pointwise and pairwise prompting, and difficulty handling listwise ranking.<n>We propose a hybrid framework that combines a traditional recommendation model with an LLM for reranking top-k items using structured prompts.
arXiv Detail & Related papers (2025-05-08T05:01:44Z)
Preference Diffusion for Recommendation [50.8692409346126]
We propose PreferDiff, a tailored optimization objective for DM-based recommenders. PreferDiff transforms BPR into a log-likelihood ranking objective to better capture user preferences. It is the first personalized ranking loss designed specifically for DM-based recommenders.
arXiv Detail & Related papers (2024-10-17T01:02:04Z)
Dissecting Human and LLM Preferences [80.55271307662365]
We find that humans are less sensitive to errors, favor responses that support their stances, and show clear dislike when models admit their limits. advanced LLMs like GPT-4-Turbo emphasize correctness, clarity, and harmlessness more. We show that preference-based evaluation can be intentionally manipulated.
arXiv Detail & Related papers (2024-02-17T14:34:31Z)
LiPO: Listwise Preference Optimization through Learning-to-Rank [62.02782819559389]
Policy can learn more effectively from a ranked list of plausible responses given the prompt.<n>We show that LiPO-$lambda$ can outperform DPO variants and SLiC by a clear margin on several preference alignment tasks.
arXiv Detail & Related papers (2024-02-02T20:08:10Z)
Peering Through Preferences: Unraveling Feedback Acquisition for Aligning Large Language Models [32.843361525236965]
We analyze the effect of sparse feedback on the alignment and evaluation of large language models. We find that preferences from ratings and rankings significantly disagree 60% for both human and AI annotators. Our findings shed light on critical gaps in methods for evaluating the real-world utility of language models.
arXiv Detail & Related papers (2023-08-30T07:35:32Z)
Rethinking Missing Data: Aleatoric Uncertainty-Aware Recommendation [59.500347564280204]
We propose a new Aleatoric Uncertainty-aware Recommendation (AUR) framework. AUR consists of a new uncertainty estimator along with a normal recommender model. As the chance of mislabeling reflects the potential of a pair, AUR makes recommendations according to the uncertainty.
arXiv Detail & Related papers (2022-09-22T04:32:51Z)
Recommendation Systems with Distribution-Free Reliability Guarantees [83.80644194980042]
We show how to return a set of items rigorously guaranteed to contain mostly good items. Our procedure endows any ranking model with rigorous finite-sample control of the false discovery rate. We evaluate our methods on the Yahoo! Learning to Rank and MSMarco datasets.
arXiv Detail & Related papers (2022-07-04T17:49:25Z)
Cross Pairwise Ranking for Unbiased Item Recommendation [57.71258289870123]
We develop a new learning paradigm named Cross Pairwise Ranking (CPR) CPR achieves unbiased recommendation without knowing the exposure mechanism. We prove in theory that this way offsets the influence of user/item propensity on the learning.
arXiv Detail & Related papers (2022-04-26T09:20:27Z)
Unbiased Pairwise Learning to Rank in Recommender Systems [4.058828240864671]
Unbiased learning to rank algorithms are appealing candidates and have already been applied in many applications with single categorical labels. We propose a novel unbiased LTR algorithm to tackle the challenges, which innovatively models position bias in the pairwise fashion. Experiment results on public benchmark datasets and internal live traffic show the superior results of the proposed method for both categorical and continuous labels.
arXiv Detail & Related papers (2021-11-25T06:04:59Z)
Correcting the User Feedback-Loop Bias for Recommendation Systems [34.44834423714441]
We propose a systematic and dynamic way to correct user feedback-loop bias in recommendation systems. Our method includes a deep-learning component to learn each user's dynamic rating history embedding. We empirically validated the existence of such user feedback-loop bias in real world recommendation systems.
arXiv Detail & Related papers (2021-09-13T15:02:55Z)
Set2setRank: Collaborative Set to Set Ranking for Implicit Feedback based Recommendation [59.183016033308014]
In this paper, we explore the unique characteristics of the implicit feedback and propose Set2setRank framework for recommendation. Our proposed framework is model-agnostic and can be easily applied to most recommendation prediction approaches.
arXiv Detail & Related papers (2021-05-16T08:06:22Z)
Dynamic-K Recommendation with Personalized Decision Boundary [41.70842736417849]
We develop a dynamic-K recommendation task as a joint learning problem with both ranking and classification objectives. We extend two state-of-the-art ranking-based recommendation methods, i.e., BPRMF and HRM, to the corresponding dynamic-K versions. Our experimental results on two datasets show that the dynamic-K models are more effective than the original fixed-N recommendation methods.
arXiv Detail & Related papers (2020-12-25T13:02:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.