Reformulating Conversational Recommender Systems as Tri-Phase Offline Policy Learning
- URL: http://arxiv.org/abs/2408.06809v2
- Date: Sat, 07 Sep 2024 09:02:29 GMT
- Title: Reformulating Conversational Recommender Systems as Tri-Phase Offline Policy Learning
- Authors: Gangyi Zhang, Chongming Gao, Hang Pan, Runzhe Teng, Ruizhe Li,
- Abstract summary: Tri-Phase Offline Policy Learning-based Conversational Recommender System (TCRS)
This paper introduces the Tri-Phase Offline Policy Learning-based Conversational Recommender System (TCRS)
- Score: 5.453444582931813
- License:
- Abstract: Existing Conversational Recommender Systems (CRS) predominantly utilize user simulators for training and evaluating recommendation policies. These simulators often oversimplify the complexity of user interactions by focusing solely on static item attributes, neglecting the rich, evolving preferences that characterize real-world user behavior. This limitation frequently leads to models that perform well in simulated environments but falter in actual deployment. Addressing these challenges, this paper introduces the Tri-Phase Offline Policy Learning-based Conversational Recommender System (TCRS), which significantly reduces dependency on real-time interactions and mitigates overfitting issues prevalent in traditional approaches. TCRS integrates a model-based offline learning strategy with a controllable user simulation that dynamically aligns with both personalized and evolving user preferences. Through comprehensive experiments, TCRS demonstrates enhanced robustness, adaptability, and accuracy in recommendations, outperforming traditional CRS models in diverse user scenarios. This approach not only provides a more realistic evaluation environment but also facilitates a deeper understanding of user behavior dynamics, thereby refining the recommendation process.
Related papers
- Interactive Visualization Recommendation with Hier-SUCB [52.11209329270573]
We propose an interactive personalized visualization recommendation (PVisRec) system that learns on user feedback from previous interactions.
For more interactive and accurate recommendations, we propose Hier-SUCB, a contextual semi-bandit in the PVisRec setting.
arXiv Detail & Related papers (2025-02-05T17:14:45Z) - Large Language Model driven Policy Exploration for Recommender Systems [50.70228564385797]
offline RL policies trained on static user data are vulnerable to distribution shift when deployed in dynamic online environments.
Online RL-based RS also face challenges in production deployment due to the risks of exposing users to untrained or unstable policies.
Large Language Models (LLMs) offer a promising solution to mimic user objectives and preferences for pre-training policies offline.
We propose an Interaction-Augmented Learned Policy (iALP) that utilizes user preferences distilled from an LLM.
arXiv Detail & Related papers (2025-01-23T16:37:44Z) - Stop Playing the Guessing Game! Target-free User Simulation for Evaluating Conversational Recommender Systems [15.481944998961847]
PEPPER is an evaluation protocol with target-free user simulators constructed from real-user interaction histories and reviews.
PEPPER enables realistic user-CRS dialogues without falling into simplistic guessing games.
PEPPER presents detailed measures for comprehensively evaluating the preference elicitation capabilities of CRSs.
arXiv Detail & Related papers (2024-11-25T07:36:20Z) - Lusifer: LLM-based User SImulated Feedback Environment for online Recommender systems [0.0]
We introduce Lusifer, a novel environment leveraging Large Language Models (LLMs) to generate simulated user feedback.
Lusifer synthesizes user profiles and interaction histories to simulate responses and behaviors toward recommended items.
Lusifer accurately emulates user behavior and preferences, even with reduced training data having an RMSE of 1.3.
arXiv Detail & Related papers (2024-05-22T05:43:15Z) - A LLM-based Controllable, Scalable, Human-Involved User Simulator Framework for Conversational Recommender Systems [14.646529557978512]
Conversational Recommender System (CRS) leverages real-time feedback from users to dynamically model their preferences.
Large Language Models (LLMs) has marked the onset of a new epoch in computational capabilities.
We introduce a Controllable, scalable, and human-Involved (CSHI) simulator framework that manages the behavior of user simulators.
arXiv Detail & Related papers (2024-05-13T03:02:56Z) - RLVF: Learning from Verbal Feedback without Overgeneralization [94.19501420241188]
We study the problem of incorporating verbal feedback without such overgeneralization.
We develop a new method Contextualized Critiques with Constrained Preference Optimization (C3PO)
Our approach effectively applies verbal feedback to relevant scenarios while preserving existing behaviors for other contexts.
arXiv Detail & Related papers (2024-02-16T18:50:24Z) - Improving Conversational Recommendation Systems via Counterfactual Data
Simulation [73.4526400381668]
Conversational recommender systems (CRSs) aim to provide recommendation services via natural language conversations.
Existing CRS approaches often suffer from the issue of insufficient training due to the scarcity of training data.
We propose a CounterFactual data simulation approach for CRS, named CFCRS, to alleviate the issue of data scarcity in CRSs.
arXiv Detail & Related papers (2023-06-05T12:48:56Z) - Rethinking the Evaluation for Conversational Recommendation in the Era
of Large Language Models [115.7508325840751]
The recent success of large language models (LLMs) has shown great potential to develop more powerful conversational recommender systems (CRSs)
In this paper, we embark on an investigation into the utilization of ChatGPT for conversational recommendation, revealing the inadequacy of the existing evaluation protocol.
We propose an interactive Evaluation approach based on LLMs named iEvaLM that harnesses LLM-based user simulators.
arXiv Detail & Related papers (2023-05-22T15:12:43Z) - Meta Policy Learning for Cold-Start Conversational Recommendation [71.13044166814186]
We study CRS policy learning for cold-start users via meta reinforcement learning.
To facilitate policy adaptation, we design three synergetic components.
arXiv Detail & Related papers (2022-05-24T05:06:52Z) - Knowledge Graph-enhanced Sampling for Conversational Recommender System [20.985222879085832]
Conversational Recommendation System (CRS) uses the interactive form of the dialogue systems to solve the problems of traditional recommendation systems.
This work proposes a contextual information enhancement model tailored for CRS, called Knowledge Graph-enhanced Sampling (KGenSam)
Two samplers are designed to enhance knowledge by sampling fuzzy samples with high uncertainty for obtaining user preferences and reliable negative samples for updating recommender.
arXiv Detail & Related papers (2021-10-13T11:00:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.