Model-enhanced Contrastive Reinforcement Learning for Sequential
Recommendation
- URL: http://arxiv.org/abs/2310.16566v1
- Date: Wed, 25 Oct 2023 11:43:29 GMT
- Title: Model-enhanced Contrastive Reinforcement Learning for Sequential
Recommendation
- Authors: Chengpeng Li, Zhengyi Yang, Jizhi Zhang, Jiancan Wu, Dingxian Wang,
Xiangnan He, Xiang Wang
- Abstract summary: We propose a novel RL recommender named model-enhanced contrastive reinforcement learning (MCRL)
On the one hand, we learn a value function to estimate the long-term engagement of users, together with a conservative value learning mechanism to alleviate the overestimation problem.
Experiments demonstrate that the proposed method significantly outperforms existing offline RL and self-supervised RL methods.
- Score: 28.218427886174506
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Reinforcement learning (RL) has been widely applied in recommendation systems
due to its potential in optimizing the long-term engagement of users. From the
perspective of RL, recommendation can be formulated as a Markov decision
process (MDP), where recommendation system (agent) can interact with users
(environment) and acquire feedback (reward signals).However, it is impractical
to conduct online interactions with the concern on user experience and
implementation complexity, and we can only train RL recommenders with offline
datasets containing limited reward signals and state transitions. Therefore,
the data sparsity issue of reward signals and state transitions is very severe,
while it has long been overlooked by existing RL recommenders.Worse still, RL
methods learn through the trial-and-error mode, but negative feedback cannot be
obtained in implicit feedback recommendation tasks, which aggravates the
overestimation problem of offline RL recommender. To address these challenges,
we propose a novel RL recommender named model-enhanced contrastive
reinforcement learning (MCRL). On the one hand, we learn a value function to
estimate the long-term engagement of users, together with a conservative value
learning mechanism to alleviate the overestimation problem.On the other hand,
we construct some positive and negative state-action pairs to model the reward
function and state transition function with contrastive learning to exploit the
internal structure information of MDP. Experiments demonstrate that the
proposed method significantly outperforms existing offline RL and
self-supervised RL methods with different representative backbone networks on
two real-world datasets.
Related papers
- ROLeR: Effective Reward Shaping in Offline Reinforcement Learning for Recommender Systems [14.74207332728742]
offline reinforcement learning (RL) is an effective tool for real-world recommender systems.
This paper proposes a novel model-based Reward Shaping in Offline Reinforcement Learning for Recommender Systems, ROLeR, for reward and uncertainty estimation.
arXiv Detail & Related papers (2024-07-18T05:07:11Z) - Hybrid Inverse Reinforcement Learning [34.793570631021005]
inverse reinforcement learning approach to imitation learning is a double-edged sword.
We propose using hybrid RL -- training on a mixture of online and expert data -- to curtail unnecessary exploration.
We derive both model-free and model-based hybrid inverse RL algorithms with strong policy performance guarantees.
arXiv Detail & Related papers (2024-02-13T23:29:09Z) - Hybrid Reinforcement Learning for Optimizing Pump Sustainability in
Real-World Water Distribution Networks [55.591662978280894]
This article addresses the pump-scheduling optimization problem to enhance real-time control of real-world water distribution networks (WDNs)
Our primary objectives are to adhere to physical operational constraints while reducing energy consumption and operational costs.
Traditional optimization techniques, such as evolution-based and genetic algorithms, often fall short due to their lack of convergence guarantees.
arXiv Detail & Related papers (2023-10-13T21:26:16Z) - Robust Reinforcement Learning Objectives for Sequential Recommender Systems [7.44049827436013]
We develop recommender systems that incorporate direct user feedback in the form of rewards, enhancing personalization for users.
employing RL algorithms presents challenges, including off-policy training, expansive action spaces, and the scarcity of datasets with sufficient reward signals.
We introduce an enhanced methodology aimed at providing a more effective solution to these challenges.
arXiv Detail & Related papers (2023-05-30T08:09:08Z) - Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning [93.99377042564919]
This paper tries to build more flexible constraints for value estimation without impeding the exploration of potential advantages.
The key idea is to leverage off-the-shelf RL simulators, which can be easily interacted with in an online manner, as the "test bed" for offline policies.
We introduce CoWorld, a model-based RL approach that mitigates cross-domain discrepancies in state and reward spaces.
arXiv Detail & Related papers (2023-05-24T15:45:35Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - Supervised Advantage Actor-Critic for Recommender Systems [76.7066594130961]
We propose negative sampling strategy for training the RL component and combine it with supervised sequential learning.
Based on sampled (negative) actions (items), we can calculate the "advantage" of a positive action over the average case.
We instantiate SNQN and SA2C with four state-of-the-art sequential recommendation models and conduct experiments on two real-world datasets.
arXiv Detail & Related papers (2021-11-05T12:51:15Z) - Improving Long-Term Metrics in Recommendation Systems using
Short-Horizon Offline RL [56.20835219296896]
We study session-based recommendation scenarios where we want to recommend items to users during sequential interactions to improve their long-term utility.
We develop a new batch RL algorithm called Short Horizon Policy Improvement (SHPI) that approximates policy-induced distribution shifts across sessions.
arXiv Detail & Related papers (2021-06-01T15:58:05Z) - Self-Supervised Reinforcement Learning for Recommender Systems [77.38665506495553]
We propose self-supervised reinforcement learning for sequential recommendation tasks.
Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL.
Based on such an approach, we propose two frameworks namely Self-Supervised Q-learning(SQN) and Self-Supervised Actor-Critic(SAC)
arXiv Detail & Related papers (2020-06-10T11:18:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.