Vague Preference Policy Learning for Conversational Recommendation
- URL: http://arxiv.org/abs/2306.04487v5
- Date: Fri, 21 Feb 2025 12:35:19 GMT
- Title: Vague Preference Policy Learning for Conversational Recommendation
- Authors: Gangyi Zhang, Chongming Gao, Wenqiang Lei, Xiaojie Guo, Shijun Li, Hongshen Chen, Zhuozhi Ding, Sulong Xu, Lingfei Wu,
- Abstract summary: Conversational recommendation systems commonly assume users have clear preferences, leading to potential over-filtering.<n>We introduce the Vague Preference Multi-round Conversational Recommendation (VPMCR) scenario, employing a soft estimation mechanism to accommodate users' vague and dynamic preferences.<n>Our work advances CRS by accommodating users' inherent ambiguity and relative decision-making processes, improving real-world applicability.
- Score: 48.868921530958666
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Conversational recommendation systems (CRS) commonly assume users have clear preferences, leading to potential over-filtering of relevant alternatives. However, users often exhibit vague, non-binary preferences. We introduce the Vague Preference Multi-round Conversational Recommendation (VPMCR) scenario, employing a soft estimation mechanism to accommodate users' vague and dynamic preferences while mitigating over-filtering. In VPMCR, we propose Vague Preference Policy Learning (VPPL), consisting of Ambiguity-aware Soft Estimation (ASE) and Dynamism-aware Policy Learning (DPL). ASE captures preference vagueness by estimating scores for clicked and non-clicked options, using a choice-based approach and time-aware preference decay. DPL leverages ASE's preference distribution to guide the conversation and adapt to preference changes for recommendations or attribute queries. Extensive experiments demonstrate VPPL's effectiveness within VPMCR, outperforming existing methods and setting a new benchmark. Our work advances CRS by accommodating users' inherent ambiguity and relative decision-making processes, improving real-world applicability.
Related papers
- Search-Based Interaction For Conversation Recommendation via Generative Reward Model Based Simulated User [117.82681846559909]
Conversational recommendation systems (CRSs) use multi-turn interaction to capture user preferences and provide personalized recommendations.
We propose a generative reward model based simulated user, named GRSU, for automatic interaction with CRSs.
arXiv Detail & Related papers (2025-04-29T06:37:30Z) - Empowering Retrieval-based Conversational Recommendation with Contrasting User Preferences [12.249992789091415]
We propose a novel conversational recommender model, called COntrasting user pReference expAnsion and Learning (CORAL)
CORAL extracts the user's hidden preferences through contrasting preference expansion.
It explicitly differentiates the contrasting preferences and leverages them into the recommendation process via preference-aware learning.
arXiv Detail & Related papers (2025-03-27T21:45:49Z) - Preference Discerning with LLM-Enhanced Generative Retrieval [28.309905847867178]
We propose a new paradigm, which we term preference discerning.<n>In preference dscerning, we explicitly condition a generative sequential recommendation system on user preferences within its context.<n>We generate user preferences using Large Language Models (LLMs) based on user reviews and item-specific data.
arXiv Detail & Related papers (2024-12-11T18:26:55Z) - Harm Mitigation in Recommender Systems under User Preference Dynamics [16.213153879446796]
We consider a recommender system that takes into account the interplay between recommendations, user interests, and harmful content.
We seek recommendation policies that establish a tradeoff between maximizing click-through rate (CTR) and mitigating harm.
arXiv Detail & Related papers (2024-06-14T09:52:47Z) - Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment [104.18002641195442]
We introduce Self-Augmented Preference Optimization (SAPO), an effective and scalable training paradigm that does not require existing paired data.
Building on the self-play concept, which autonomously generates negative responses, we further incorporate an off-policy learning pipeline to enhance data exploration and exploitation.
arXiv Detail & Related papers (2024-05-31T14:21:04Z) - Comparing Bad Apples to Good Oranges: Aligning Large Language Models via Joint Preference Optimization [105.3612692153615]
We propose a new axis based on eliciting preferences jointly over instruction-response pairs.<n>Joint preferences over instruction and response pairs can significantly enhance the alignment of large language models.
arXiv Detail & Related papers (2024-03-31T02:05:40Z) - Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts [95.09994361995389]
Relative Preference Optimization (RPO) is designed to discern between more and less preferred responses derived from both identical and related prompts.
RPO has demonstrated a superior ability to align large language models with user preferences and to improve their adaptability during the training process.
arXiv Detail & Related papers (2024-02-12T22:47:57Z) - Estimating and Penalizing Induced Preference Shifts in Recommender
Systems [10.052697877248601]
We argue that system designers should: estimate the shifts a recommender would induce; evaluate whether such shifts would be undesirable; and even actively optimize to avoid problematic shifts.
We do this by using historical user interaction data to train predictive user model which implicitly contains their preference dynamics.
In simulated experiments, we show that our learned preference dynamics model is effective in estimating user preferences and how they would respond to new recommenders.
arXiv Detail & Related papers (2022-04-25T21:04:46Z) - Reward Constrained Interactive Recommendation with Natural Language
Feedback [158.8095688415973]
We propose a novel constraint-augmented reinforcement learning (RL) framework to efficiently incorporate user preferences over time.
Specifically, we leverage a discriminator to detect recommendations violating user historical preference.
Our proposed framework is general and is further extended to the task of constrained text generation.
arXiv Detail & Related papers (2020-05-04T16:23:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.