Related papers: Reward Constrained Interactive Recommendation with Natural Language Feedback

Reward Constrained Interactive Recommendation with Natural Language Feedback

URL: http://arxiv.org/abs/2005.01618v1
Date: Mon, 4 May 2020 16:23:34 GMT
Title: Reward Constrained Interactive Recommendation with Natural Language Feedback
Authors: Ruiyi Zhang, Tong Yu, Yilin Shen, Hongxia Jin, Changyou Chen, Lawrence Carin
Abstract summary: We propose a novel constraint-augmented reinforcement learning (RL) framework to efficiently incorporate user preferences over time. Specifically, we leverage a discriminator to detect recommendations violating user historical preference. Our proposed framework is general and is further extended to the task of constrained text generation.
Score: 158.8095688415973
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-based interactive recommendation provides richer user feedback and has demonstrated advantages over traditional interactive recommender systems. However, recommendations can easily violate preferences of users from their past natural-language feedback, since the recommender needs to explore new items for further improvement. To alleviate this issue, we propose a novel constraint-augmented reinforcement learning (RL) framework to efficiently incorporate user preferences over time. Specifically, we leverage a discriminator to detect recommendations violating user historical preference, which is incorporated into the standard RL objective of maximizing expected cumulative future rewards. Our proposed framework is general and is further extended to the task of constrained text generation. Empirical results show that the proposed method yields consistent improvement relative to standard RL methods.

Related papers

Search-Based Interaction For Conversation Recommendation via Generative Reward Model Based Simulated User [117.82681846559909]
Conversational recommendation systems (CRSs) use multi-turn interaction to capture user preferences and provide personalized recommendations. We propose a generative reward model based simulated user, named GRSU, for automatic interaction with CRSs.
arXiv Detail & Related papers (2025-04-29T06:37:30Z)
Interactive Visualization Recommendation with Hier-SUCB [52.11209329270573]
We propose an interactive personalized visualization recommendation (PVisRec) system that learns on user feedback from previous interactions. For more interactive and accurate recommendations, we propose Hier-SUCB, a contextual semi-bandit in the PVisRec setting.
arXiv Detail & Related papers (2025-02-05T17:14:45Z)
Reason4Rec: Large Language Models for Recommendation with Deliberative User Preference Alignment [69.11529841118671]
We propose a new Deliberative Recommendation task, which incorporates explicit reasoning about user preferences as an additional alignment goal. We then introduce the Reasoning-powered Recommender framework for deliberative user preference alignment.
arXiv Detail & Related papers (2025-02-04T07:17:54Z)
Preference Discerning with LLM-Enhanced Generative Retrieval [28.309905847867178]
We propose a new paradigm, which we term preference discerning. In preference dscerning, we explicitly condition a generative sequential recommendation system on user preferences within its context. We generate user preferences using Large Language Models (LLMs) based on user reviews and item-specific data.
arXiv Detail & Related papers (2024-12-11T18:26:55Z)
RLVF: Learning from Verbal Feedback without Overgeneralization [94.19501420241188]
We study the problem of incorporating verbal feedback without such overgeneralization. We develop a new method Contextualized Critiques with Constrained Preference Optimization (C3PO) Our approach effectively applies verbal feedback to relevant scenarios while preserving existing behaviors for other contexts.
arXiv Detail & Related papers (2024-02-16T18:50:24Z)
Hierarchical Reinforcement Learning for Modeling User Novelty-Seeking Intent in Recommender Systems [26.519571240032967]
We propose a novel hierarchical reinforcement learning-based method to model the hierarchical user novelty-seeking intent. We further incorporate diversity and novelty-related measurement in the reward function of the hierarchical RL (HRL) agent to encourage user exploration.
arXiv Detail & Related papers (2023-06-02T12:02:23Z)
Editable User Profiles for Controllable Text Recommendation [66.00743968792275]
We propose LACE, a novel concept value bottleneck model for controllable text recommendations. LACE represents each user with a succinct set of human-readable concepts. It learns personalized representations of the concepts based on user documents.
arXiv Detail & Related papers (2023-04-09T14:52:18Z)
Chat-REC: Towards Interactive and Explainable LLMs-Augmented Recommender System [11.404192885921498]
Chat-Rec is a new paradigm for building conversational recommender systems. Chat-Rec is effective in learning user preferences and establishing connections between users and products. In experiments, Chat-Rec effectively improve the results of top-k recommendations and performs better in zero-shot rating prediction task.
arXiv Detail & Related papers (2023-03-25T17:37:43Z)
Generative Slate Recommendation with Reinforcement Learning [49.75985313698214]
reinforcement learning algorithms can be used to optimize user engagement in recommender systems. However, RL approaches are intractable in the slate recommendation scenario. In that setting, an action corresponds to a slate that may contain any combination of items. In this work we propose to encode slates in a continuous, low-dimensional latent space learned by a variational auto-encoder. We are able to (i) relax assumptions required by previous work, and (ii) improve the quality of the action selection by modeling full slates.
arXiv Detail & Related papers (2023-01-20T15:28:09Z)
Comparison-based Conversational Recommender System with Relative Bandit Feedback [15.680698037463488]
We propose a novel comparison-based conversational recommender system. We propose a new bandit algorithm, which we call RelativeConUCB. The experiments on both synthetic and real-world datasets validate the advantage of our proposed method.
arXiv Detail & Related papers (2022-08-21T08:05:46Z)
Two-Stage Neural Contextual Bandits for Personalised News Recommendation [50.3750507789989]
Existing personalised news recommendation methods focus on exploiting user interests and ignores exploration in recommendation. We build on contextual bandits recommendation strategies which naturally address the exploitation-exploration trade-off. We use deep learning representations for users and news, and generalise the neural upper confidence bound (UCB) policies to generalised additive UCB and bilinear UCB.
arXiv Detail & Related papers (2022-06-26T12:07:56Z)
CausPref: Causal Preference Learning for Out-of-Distribution Recommendation [36.22965012642248]
The current recommender system is still vulnerable to the distribution shift of users and items in realistic scenarios. We propose to incorporate the recommendation-specific DAG learner into a novel causal preference-based recommendation framework named CausPref. Our approach surpasses the benchmark models significantly under types of out-of-distribution settings.
arXiv Detail & Related papers (2022-02-08T16:42:03Z)
Improving Long-Term Metrics in Recommendation Systems using Short-Horizon Offline RL [56.20835219296896]
We study session-based recommendation scenarios where we want to recommend items to users during sequential interactions to improve their long-term utility. We develop a new batch RL algorithm called Short Horizon Policy Improvement (SHPI) that approximates policy-induced distribution shifts across sessions.
arXiv Detail & Related papers (2021-06-01T15:58:05Z)
Offline Meta-level Model-based Reinforcement Learning Approach for Cold-Start Recommendation [27.17948754183511]
Reinforcement learning has shown great promise in optimizing long-term user interest in recommender systems. Existing RL-based recommendation methods need a large number of interactions for each user to learn a robust recommendation policy. We propose a meta-level model-based reinforcement learning approach for fast user adaptation.
arXiv Detail & Related papers (2020-12-04T08:58:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.