Self-Supervised Reinforcement Learning for Recommender Systems
- URL: http://arxiv.org/abs/2006.05779v2
- Date: Thu, 11 Jun 2020 09:36:45 GMT
- Title: Self-Supervised Reinforcement Learning for Recommender Systems
- Authors: Xin Xin, Alexandros Karatzoglou, Ioannis Arapakis, Joemon M. Jose
- Abstract summary: We propose self-supervised reinforcement learning for sequential recommendation tasks.
Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL.
Based on such an approach, we propose two frameworks namely Self-Supervised Q-learning(SQN) and Self-Supervised Actor-Critic(SAC)
- Score: 77.38665506495553
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In session-based or sequential recommendation, it is important to consider a
number of factors like long-term user engagement, multiple types of user-item
interactions such as clicks, purchases etc. The current state-of-the-art
supervised approaches fail to model them appropriately. Casting sequential
recommendation task as a reinforcement learning (RL) problem is a promising
direction. A major component of RL approaches is to train the agent through
interactions with the environment. However, it is often problematic to train a
recommender in an on-line fashion due to the requirement to expose users to
irrelevant recommendations. As a result, learning the policy from logged
implicit feedback is of vital importance, which is challenging due to the pure
off-policy setting and lack of negative rewards (feedback). In this paper, we
propose self-supervised reinforcement learning for sequential recommendation
tasks. Our approach augments standard recommendation models with two output
layers: one for self-supervised learning and the other for RL. The RL part acts
as a regularizer to drive the supervised layer focusing on specific
rewards(e.g., recommending items which may lead to purchases rather than
clicks) while the self-supervised layer with cross-entropy loss provides strong
gradient signals for parameter updates. Based on such an approach, we propose
two frameworks namely Self-Supervised Q-learning(SQN) and Self-Supervised
Actor-Critic(SAC). We integrate the proposed frameworks with four
state-of-the-art recommendation models. Experimental results on two real-world
datasets demonstrate the effectiveness of our approach.
Related papers
- Adversarial Batch Inverse Reinforcement Learning: Learn to Reward from
Imperfect Demonstration for Interactive Recommendation [23.048841953423846]
We focus on the problem of learning to reward, which is fundamental to reinforcement learning.
Previous approaches either introduce additional procedures for learning to reward, thereby increasing the complexity of optimization.
We propose a novel batch inverse reinforcement learning paradigm that achieves the desired properties.
arXiv Detail & Related papers (2023-10-30T13:43:20Z) - Model-enhanced Contrastive Reinforcement Learning for Sequential
Recommendation [28.218427886174506]
We propose a novel RL recommender named model-enhanced contrastive reinforcement learning (MCRL)
On the one hand, we learn a value function to estimate the long-term engagement of users, together with a conservative value learning mechanism to alleviate the overestimation problem.
Experiments demonstrate that the proposed method significantly outperforms existing offline RL and self-supervised RL methods.
arXiv Detail & Related papers (2023-10-25T11:43:29Z) - Multi-behavior Self-supervised Learning for Recommendation [36.42241501002167]
We propose a Multi-Behavior Self-Supervised Learning (MBSSL) framework together with an adaptive optimization method.
Specifically, we devise a behavior-aware graph neural network incorporating the self-attention mechanism to capture behavior multiplicity and dependencies.
Experiments on five real-world datasets demonstrate the consistent improvements obtained by MBSSL over ten state-of-the art (SOTA) baselines.
arXiv Detail & Related papers (2023-05-22T15:57:32Z) - WSLRec: Weakly Supervised Learning for Neural Sequential Recommendation
Models [24.455665093145818]
We propose a novel model-agnostic training approach called WSLRec, which adopts a three-stage framework: pre-training, top-$k$ mining, intrinsic and fine-tuning.
WSLRec resolves the incompleteness problem by pre-training models on extra weak supervisions from model-free methods like BR and ItemCF, while resolving the inaccuracy problem by leveraging the top-$k$ mining to screen out reliable user-item relevance from weak supervisions for fine-tuning.
arXiv Detail & Related papers (2022-02-28T08:55:12Z) - Supervised Advantage Actor-Critic for Recommender Systems [76.7066594130961]
We propose negative sampling strategy for training the RL component and combine it with supervised sequential learning.
Based on sampled (negative) actions (items), we can calculate the "advantage" of a positive action over the average case.
We instantiate SNQN and SA2C with four state-of-the-art sequential recommendation models and conduct experiments on two real-world datasets.
arXiv Detail & Related papers (2021-11-05T12:51:15Z) - Choosing the Best of Both Worlds: Diverse and Novel Recommendations
through Multi-Objective Reinforcement Learning [68.45370492516531]
We introduce Scalarized Multi-Objective Reinforcement Learning (SMORL) for the Recommender Systems (RS) setting.
SMORL agent augments standard recommendation models with additional RL layers that enforce it to simultaneously satisfy three principal objectives: accuracy, diversity, and novelty of recommendations.
Our experimental results on two real-world datasets reveal a substantial increase in aggregate diversity, a moderate increase in accuracy, reduced repetitiveness of recommendations, and demonstrate the importance of reinforcing diversity and novelty as complementary objectives.
arXiv Detail & Related papers (2021-10-28T13:22:45Z) - Contrastive Self-supervised Sequential Recommendation with Robust
Augmentation [101.25762166231904]
Sequential Recommendationdescribes a set of techniques to model dynamic user behavior in order to predict future interactions in sequential user data.
Old and new issues remain, including data-sparsity and noisy data.
We propose Contrastive Self-Supervised Learning for sequential Recommendation (CoSeRec)
arXiv Detail & Related papers (2021-08-14T07:15:25Z) - Generative Inverse Deep Reinforcement Learning for Online Recommendation [62.09946317831129]
We propose a novel inverse reinforcement learning approach, namely InvRec, for online recommendation.
InvRec extracts the reward function from user's behaviors automatically, for online recommendation.
arXiv Detail & Related papers (2020-11-04T12:12:25Z) - Sequential Recommendation with Self-Attentive Multi-Adversarial Network [101.25533520688654]
We present a Multi-Factor Generative Adversarial Network (MFGAN) for explicitly modeling the effect of context information on sequential recommendation.
Our framework is flexible to incorporate multiple kinds of factor information, and is able to trace how each factor contributes to the recommendation decision over time.
arXiv Detail & Related papers (2020-05-21T12:28:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.