Model-free Reinforcement Learning with Stochastic Reward Stabilization
for Recommender Systems
- URL: http://arxiv.org/abs/2308.13246v1
- Date: Fri, 25 Aug 2023 08:42:45 GMT
- Title: Model-free Reinforcement Learning with Stochastic Reward Stabilization
for Recommender Systems
- Authors: Tianchi Cai, Shenliao Bao, Jiyan Jiang, Shiji Zhou, Wenpeng Zhang,
Lihong Gu, Jinjie Gu, Guannan Zhang
- Abstract summary: One user's feedback on the same item at different times is random.
We design two reward stabilization frameworks that replace the direct feedback with that learned by a supervised model.
- Score: 20.395091290715502
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Model-free RL-based recommender systems have recently received increasing
research attention due to their capability to handle partial feedback and
long-term rewards. However, most existing research has ignored a critical
feature in recommender systems: one user's feedback on the same item at
different times is random. The stochastic rewards property essentially differs
from that in classic RL scenarios with deterministic rewards, which makes
RL-based recommender systems much more challenging. In this paper, we first
demonstrate in a simulator environment where using direct stochastic feedback
results in a significant drop in performance. Then to handle the stochastic
feedback more efficiently, we design two stochastic reward stabilization
frameworks that replace the direct stochastic feedback with that learned by a
supervised model. Both frameworks are model-agnostic, i.e., they can
effectively utilize various supervised models. We demonstrate the superiority
of the proposed frameworks over different RL-based recommendation baselines
with extensive experiments on a recommendation simulator as well as an
industrial-level recommender system.
Related papers
- ROLeR: Effective Reward Shaping in Offline Reinforcement Learning for Recommender Systems [14.74207332728742]
offline reinforcement learning (RL) is an effective tool for real-world recommender systems.
This paper proposes a novel model-based Reward Shaping in Offline Reinforcement Learning for Recommender Systems, ROLeR, for reward and uncertainty estimation.
arXiv Detail & Related papers (2024-07-18T05:07:11Z) - RewardBench: Evaluating Reward Models for Language Modeling [100.28366840977966]
We present RewardBench, a benchmark dataset and code-base for evaluation of reward models.
The dataset is a collection of prompt-chosen-rejected trios spanning chat, reasoning, and safety.
On the RewardBench leaderboard, we evaluate reward models trained with a variety of methods.
arXiv Detail & Related papers (2024-03-20T17:49:54Z) - Deep Reinforcement Learning from Hierarchical Preference Design [99.46415116087259]
This paper shows by exploiting certain structures, one can ease the reward design process.
We propose a hierarchical reward modeling framework -- HERON for scenarios: (I) The feedback signals naturally present hierarchy; (II) The reward is sparse, but with less important surrogate feedback to help policy learning.
arXiv Detail & Related papers (2023-09-06T00:44:29Z) - Generative Slate Recommendation with Reinforcement Learning [49.75985313698214]
reinforcement learning algorithms can be used to optimize user engagement in recommender systems.
However, RL approaches are intractable in the slate recommendation scenario.
In that setting, an action corresponds to a slate that may contain any combination of items.
In this work we propose to encode slates in a continuous, low-dimensional latent space learned by a variational auto-encoder.
We are able to (i) relax assumptions required by previous work, and (ii) improve the quality of the action selection by modeling full slates.
arXiv Detail & Related papers (2023-01-20T15:28:09Z) - RGRecSys: A Toolkit for Robustness Evaluation of Recommender Systems [100.54655931138444]
We propose a more holistic view of robustness for recommender systems that encompasses multiple dimensions.
We present a robustness evaluation toolkit, Robustness Gym for RecSys, that allows us to quickly and uniformly evaluate the robustness of recommender system models.
arXiv Detail & Related papers (2022-01-12T10:32:53Z) - Supervised Advantage Actor-Critic for Recommender Systems [76.7066594130961]
We propose negative sampling strategy for training the RL component and combine it with supervised sequential learning.
Based on sampled (negative) actions (items), we can calculate the "advantage" of a positive action over the average case.
We instantiate SNQN and SA2C with four state-of-the-art sequential recommendation models and conduct experiments on two real-world datasets.
arXiv Detail & Related papers (2021-11-05T12:51:15Z) - Recommendation Fairness: From Static to Dynamic [12.080824433982993]
We discuss how fairness could be baked into reinforcement learning techniques for recommendation.
We argue that in order to make further progress in recommendation fairness, we may want to consider multi-agent (game-theoretic) optimization, multi-objective (Pareto) optimization.
arXiv Detail & Related papers (2021-09-05T21:38:05Z) - Top-N Recommendation with Counterfactual User Preference Simulation [26.597102553608348]
Top-N recommendation, which aims to learn user ranking-based preference, has long been a fundamental problem in a wide range of applications.
In this paper, we propose to reformulate the recommendation task within the causal inference framework to handle the data scarce problem.
arXiv Detail & Related papers (2021-09-02T14:28:46Z) - Fast Multi-Step Critiquing for VAE-based Recommender Systems [27.207067974031805]
We present M&Ms-VAE, a novel variational autoencoder for recommendation and explanation.
We train the model under a weak supervision scheme to simulate both fully and partially observed variables.
We then leverage the generalization ability of a trained M&Ms-VAE model to embed the user preference and the critique separately.
arXiv Detail & Related papers (2021-05-03T12:26:09Z) - Self-Supervised Reinforcement Learning for Recommender Systems [77.38665506495553]
We propose self-supervised reinforcement learning for sequential recommendation tasks.
Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL.
Based on such an approach, we propose two frameworks namely Self-Supervised Q-learning(SQN) and Self-Supervised Actor-Critic(SAC)
arXiv Detail & Related papers (2020-06-10T11:18:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.