Generative Slate Recommendation with Reinforcement Learning
- URL: http://arxiv.org/abs/2301.08632v2
- Date: Tue, 24 Jan 2023 10:29:43 GMT
- Title: Generative Slate Recommendation with Reinforcement Learning
- Authors: Romain Deffayet, Thibaut Thonet, Jean-Michel Renders, Maarten de Rijke
- Abstract summary: reinforcement learning algorithms can be used to optimize user engagement in recommender systems.
However, RL approaches are intractable in the slate recommendation scenario.
In that setting, an action corresponds to a slate that may contain any combination of items.
In this work we propose to encode slates in a continuous, low-dimensional latent space learned by a variational auto-encoder.
We are able to (i) relax assumptions required by previous work, and (ii) improve the quality of the action selection by modeling full slates.
- Score: 49.75985313698214
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent research has employed reinforcement learning (RL) algorithms to
optimize long-term user engagement in recommender systems, thereby avoiding
common pitfalls such as user boredom and filter bubbles. They capture the
sequential and interactive nature of recommendations, and thus offer a
principled way to deal with long-term rewards and avoid myopic behaviors.
However, RL approaches are intractable in the slate recommendation scenario -
where a list of items is recommended at each interaction turn - due to the
combinatorial action space. In that setting, an action corresponds to a slate
that may contain any combination of items.
While previous work has proposed well-chosen decompositions of actions so as
to ensure tractability, these rely on restrictive and sometimes unrealistic
assumptions. Instead, in this work we propose to encode slates in a continuous,
low-dimensional latent space learned by a variational auto-encoder. Then, the
RL agent selects continuous actions in this latent space, which are ultimately
decoded into the corresponding slates. By doing so, we are able to (i) relax
assumptions required by previous work, and (ii) improve the quality of the
action selection by modeling full slates instead of independent items, in
particular by enabling diversity. Our experiments performed on a wide array of
simulated environments confirm the effectiveness of our generative modeling of
slates over baselines in practical scenarios where the restrictive assumptions
underlying the baselines are lifted. Our findings suggest that representation
learning using generative models is a promising direction towards generalizable
RL-based slate recommendation.
Related papers
- Causality-Enhanced Behavior Sequence Modeling in LLMs for Personalized Recommendation [47.29682938439268]
We propose a novel Counterfactual Fine-Tuning (CFT) method to improve user preference modeling.
We employ counterfactual reasoning to identify the causal effects of behavior sequences on model output.
Experiments on real-world datasets demonstrate that CFT effectively improves behavior sequence modeling.
arXiv Detail & Related papers (2024-10-30T08:41:13Z) - An Efficient Continuous Control Perspective for Reinforcement-Learning-based Sequential Recommendation [14.506332665769746]
We propose an underlinetextbfEfficient underlinetextbfContinuous underlinetextbfControl framework (ECoC)
Based on a statistically tested assumption, we first propose the novel unified action representation abstracted from normalized user and item spaces.
During this process, strategic exploration and directional control in terms of unified actions are carefully designed and crucial to final recommendation decisions.
arXiv Detail & Related papers (2024-08-15T09:26:26Z) - LIRE: listwise reward enhancement for preference alignment [27.50204023448716]
We propose a gradient-based reward optimization approach that incorporates the offline rewards of multiple responses into a streamlined listwise framework.
LIRE is straightforward to implement, requiring minimal parameter tuning, and seamlessly aligns with the pairwise paradigm.
Our experiments demonstrate that LIRE consistently outperforms existing methods across several benchmarks on dialogue and summarization tasks.
arXiv Detail & Related papers (2024-05-22T10:21:50Z) - Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions.
We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training.
As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z) - Contrastive Self-supervised Sequential Recommendation with Robust
Augmentation [101.25762166231904]
Sequential Recommendationdescribes a set of techniques to model dynamic user behavior in order to predict future interactions in sequential user data.
Old and new issues remain, including data-sparsity and noisy data.
We propose Contrastive Self-Supervised Learning for sequential Recommendation (CoSeRec)
arXiv Detail & Related papers (2021-08-14T07:15:25Z) - Sequence Adaptation via Reinforcement Learning in Recommender Systems [8.909115457491522]
We propose the SAR model, which learns the sequential patterns and adjusts the sequence length of user-item interactions in a personalized manner.
In addition, we optimize a joint loss function to align the accuracy of the sequential recommendations with the expected cumulative rewards of the critic network.
Our experimental evaluation on four real-world datasets demonstrates the superiority of our proposed model over several baseline approaches.
arXiv Detail & Related papers (2021-07-31T13:56:46Z) - Self-Supervised Reinforcement Learning for Recommender Systems [77.38665506495553]
We propose self-supervised reinforcement learning for sequential recommendation tasks.
Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL.
Based on such an approach, we propose two frameworks namely Self-Supervised Q-learning(SQN) and Self-Supervised Actor-Critic(SAC)
arXiv Detail & Related papers (2020-06-10T11:18:57Z) - Sequential Recommendation with Self-Attentive Multi-Adversarial Network [101.25533520688654]
We present a Multi-Factor Generative Adversarial Network (MFGAN) for explicitly modeling the effect of context information on sequential recommendation.
Our framework is flexible to incorporate multiple kinds of factor information, and is able to trace how each factor contributes to the recommendation decision over time.
arXiv Detail & Related papers (2020-05-21T12:28:59Z) - Nested-Wasserstein Self-Imitation Learning for Sequence Generation [158.19606942252284]
We propose the concept of nested-Wasserstein distance for distributional semantic matching.
A novel nested-Wasserstein self-imitation learning framework is developed, encouraging the model to exploit historical high-rewarded sequences.
arXiv Detail & Related papers (2020-01-20T02:19:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.