Reward Constrained Interactive Recommendation with Natural Language
Feedback
- URL: http://arxiv.org/abs/2005.01618v1
- Date: Mon, 4 May 2020 16:23:34 GMT
- Title: Reward Constrained Interactive Recommendation with Natural Language
Feedback
- Authors: Ruiyi Zhang, Tong Yu, Yilin Shen, Hongxia Jin, Changyou Chen, Lawrence
Carin
- Abstract summary: We propose a novel constraint-augmented reinforcement learning (RL) framework to efficiently incorporate user preferences over time.
Specifically, we leverage a discriminator to detect recommendations violating user historical preference.
Our proposed framework is general and is further extended to the task of constrained text generation.
- Score: 158.8095688415973
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-based interactive recommendation provides richer user feedback and has
demonstrated advantages over traditional interactive recommender systems.
However, recommendations can easily violate preferences of users from their
past natural-language feedback, since the recommender needs to explore new
items for further improvement. To alleviate this issue, we propose a novel
constraint-augmented reinforcement learning (RL) framework to efficiently
incorporate user preferences over time. Specifically, we leverage a
discriminator to detect recommendations violating user historical preference,
which is incorporated into the standard RL objective of maximizing expected
cumulative future rewards. Our proposed framework is general and is further
extended to the task of constrained text generation. Empirical results show
that the proposed method yields consistent improvement relative to standard RL
methods.
Related papers
- RLVF: Learning from Verbal Feedback without Overgeneralization [94.19501420241188]
We study the problem of incorporating verbal feedback without such overgeneralization.
We develop a new method Contextualized Critiques with Constrained Preference Optimization (C3PO)
Our approach effectively applies verbal feedback to relevant scenarios while preserving existing behaviors for other contexts.
arXiv Detail & Related papers (2024-02-16T18:50:24Z) - Hierarchical Reinforcement Learning for Modeling User Novelty-Seeking
Intent in Recommender Systems [26.519571240032967]
We propose a novel hierarchical reinforcement learning-based method to model the hierarchical user novelty-seeking intent.
We further incorporate diversity and novelty-related measurement in the reward function of the hierarchical RL (HRL) agent to encourage user exploration.
arXiv Detail & Related papers (2023-06-02T12:02:23Z) - Editable User Profiles for Controllable Text Recommendation [66.00743968792275]
We propose LACE, a novel concept value bottleneck model for controllable text recommendations.
LACE represents each user with a succinct set of human-readable concepts.
It learns personalized representations of the concepts based on user documents.
arXiv Detail & Related papers (2023-04-09T14:52:18Z) - Chat-REC: Towards Interactive and Explainable LLMs-Augmented Recommender
System [11.404192885921498]
Chat-Rec is a new paradigm for building conversational recommender systems.
Chat-Rec is effective in learning user preferences and establishing connections between users and products.
In experiments, Chat-Rec effectively improve the results of top-k recommendations and performs better in zero-shot rating prediction task.
arXiv Detail & Related papers (2023-03-25T17:37:43Z) - Generative Slate Recommendation with Reinforcement Learning [49.75985313698214]
reinforcement learning algorithms can be used to optimize user engagement in recommender systems.
However, RL approaches are intractable in the slate recommendation scenario.
In that setting, an action corresponds to a slate that may contain any combination of items.
In this work we propose to encode slates in a continuous, low-dimensional latent space learned by a variational auto-encoder.
We are able to (i) relax assumptions required by previous work, and (ii) improve the quality of the action selection by modeling full slates.
arXiv Detail & Related papers (2023-01-20T15:28:09Z) - Comparison-based Conversational Recommender System with Relative Bandit
Feedback [15.680698037463488]
We propose a novel comparison-based conversational recommender system.
We propose a new bandit algorithm, which we call RelativeConUCB.
The experiments on both synthetic and real-world datasets validate the advantage of our proposed method.
arXiv Detail & Related papers (2022-08-21T08:05:46Z) - Two-Stage Neural Contextual Bandits for Personalised News Recommendation [50.3750507789989]
Existing personalised news recommendation methods focus on exploiting user interests and ignores exploration in recommendation.
We build on contextual bandits recommendation strategies which naturally address the exploitation-exploration trade-off.
We use deep learning representations for users and news, and generalise the neural upper confidence bound (UCB) policies to generalised additive UCB and bilinear UCB.
arXiv Detail & Related papers (2022-06-26T12:07:56Z) - CausPref: Causal Preference Learning for Out-of-Distribution
Recommendation [36.22965012642248]
The current recommender system is still vulnerable to the distribution shift of users and items in realistic scenarios.
We propose to incorporate the recommendation-specific DAG learner into a novel causal preference-based recommendation framework named CausPref.
Our approach surpasses the benchmark models significantly under types of out-of-distribution settings.
arXiv Detail & Related papers (2022-02-08T16:42:03Z) - Improving Long-Term Metrics in Recommendation Systems using
Short-Horizon Offline RL [56.20835219296896]
We study session-based recommendation scenarios where we want to recommend items to users during sequential interactions to improve their long-term utility.
We develop a new batch RL algorithm called Short Horizon Policy Improvement (SHPI) that approximates policy-induced distribution shifts across sessions.
arXiv Detail & Related papers (2021-06-01T15:58:05Z) - Offline Meta-level Model-based Reinforcement Learning Approach for
Cold-Start Recommendation [27.17948754183511]
Reinforcement learning has shown great promise in optimizing long-term user interest in recommender systems.
Existing RL-based recommendation methods need a large number of interactions for each user to learn a robust recommendation policy.
We propose a meta-level model-based reinforcement learning approach for fast user adaptation.
arXiv Detail & Related papers (2020-12-04T08:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.