Related papers: Generative Slate Recommendation with Reinforcement Learning

Generative Slate Recommendation with Reinforcement Learning

URL: http://arxiv.org/abs/2301.08632v2
Date: Tue, 24 Jan 2023 10:29:43 GMT
Title: Generative Slate Recommendation with Reinforcement Learning
Authors: Romain Deffayet, Thibaut Thonet, Jean-Michel Renders, Maarten de Rijke
Abstract summary: reinforcement learning algorithms can be used to optimize user engagement in recommender systems. However, RL approaches are intractable in the slate recommendation scenario. In that setting, an action corresponds to a slate that may contain any combination of items. In this work we propose to encode slates in a continuous, low-dimensional latent space learned by a variational auto-encoder. We are able to (i) relax assumptions required by previous work, and (ii) improve the quality of the action selection by modeling full slates.
Score: 49.75985313698214
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Recent research has employed reinforcement learning (RL) algorithms to optimize long-term user engagement in recommender systems, thereby avoiding common pitfalls such as user boredom and filter bubbles. They capture the sequential and interactive nature of recommendations, and thus offer a principled way to deal with long-term rewards and avoid myopic behaviors. However, RL approaches are intractable in the slate recommendation scenario - where a list of items is recommended at each interaction turn - due to the combinatorial action space. In that setting, an action corresponds to a slate that may contain any combination of items. While previous work has proposed well-chosen decompositions of actions so as to ensure tractability, these rely on restrictive and sometimes unrealistic assumptions. Instead, in this work we propose to encode slates in a continuous, low-dimensional latent space learned by a variational auto-encoder. Then, the RL agent selects continuous actions in this latent space, which are ultimately decoded into the corresponding slates. By doing so, we are able to (i) relax assumptions required by previous work, and (ii) improve the quality of the action selection by modeling full slates instead of independent items, in particular by enabling diversity. Our experiments performed on a wide array of simulated environments confirm the effectiveness of our generative modeling of slates over baselines in practical scenarios where the restrictive assumptions underlying the baselines are lifted. Our findings suggest that representation learning using generative models is a promising direction towards generalizable RL-based slate recommendation.

Related papers

Large Language Model Empowered Recommendation Meets All-domain Continual Pre-Training [60.38082979765664]
CPRec is an All-domain Continual Pre-Training framework for Recommendation. It holistically align LLMs with universal user behaviors through the continual pre-training paradigm. We conduct experiments on five real-world datasets from two distinct platforms.
arXiv Detail & Related papers (2025-04-11T20:01:25Z)
Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation [20.965068290049057]
We propose textbfReaRec, the first inference-time computing framework for recommender systems. ReaRec autoregressively feeds the sequence's last hidden state into the sequential recommender. We introduce two lightweight reasoning-based learning methods, Ensemble Reasoning Learning (ERL) and Progressive Reasoning Learning (PRL)
arXiv Detail & Related papers (2025-03-28T17:59:03Z)
Causality-Enhanced Behavior Sequence Modeling in LLMs for Personalized Recommendation [47.29682938439268]
We propose a novel Counterfactual Fine-Tuning (CFT) method to improve user preference modeling. We employ counterfactual reasoning to identify the causal effects of behavior sequences on model output. Experiments on real-world datasets demonstrate that CFT effectively improves behavior sequence modeling.
arXiv Detail & Related papers (2024-10-30T08:41:13Z)
An Efficient Continuous Control Perspective for Reinforcement-Learning-based Sequential Recommendation [14.506332665769746]
We propose an underlinetextbfEfficient underlinetextbfContinuous underlinetextbfControl framework (ECoC) Based on a statistically tested assumption, we first propose the novel unified action representation abstracted from normalized user and item spaces. During this process, strategic exploration and directional control in terms of unified actions are carefully designed and crucial to final recommendation decisions.
arXiv Detail & Related papers (2024-08-15T09:26:26Z)
LIRE: listwise reward enhancement for preference alignment [27.50204023448716]
We propose a gradient-based reward optimization approach that incorporates the offline rewards of multiple responses into a streamlined listwise framework. LIRE is straightforward to implement, requiring minimal parameter tuning, and seamlessly aligns with the pairwise paradigm. Our experiments demonstrate that LIRE consistently outperforms existing methods across several benchmarks on dialogue and summarization tasks.
arXiv Detail & Related papers (2024-05-22T10:21:50Z)
Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions. We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training. As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z)
Contrastive Self-supervised Sequential Recommendation with Robust Augmentation [101.25762166231904]
Sequential Recommendationdescribes a set of techniques to model dynamic user behavior in order to predict future interactions in sequential user data. Old and new issues remain, including data-sparsity and noisy data. We propose Contrastive Self-Supervised Learning for sequential Recommendation (CoSeRec)
arXiv Detail & Related papers (2021-08-14T07:15:25Z)
Sequence Adaptation via Reinforcement Learning in Recommender Systems [8.909115457491522]
We propose the SAR model, which learns the sequential patterns and adjusts the sequence length of user-item interactions in a personalized manner. In addition, we optimize a joint loss function to align the accuracy of the sequential recommendations with the expected cumulative rewards of the critic network. Our experimental evaluation on four real-world datasets demonstrates the superiority of our proposed model over several baseline approaches.
arXiv Detail & Related papers (2021-07-31T13:56:46Z)
Self-Supervised Reinforcement Learning for Recommender Systems [77.38665506495553]
We propose self-supervised reinforcement learning for sequential recommendation tasks. Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL. Based on such an approach, we propose two frameworks namely Self-Supervised Q-learning(SQN) and Self-Supervised Actor-Critic(SAC)
arXiv Detail & Related papers (2020-06-10T11:18:57Z)
Sequential Recommendation with Self-Attentive Multi-Adversarial Network [101.25533520688654]
We present a Multi-Factor Generative Adversarial Network (MFGAN) for explicitly modeling the effect of context information on sequential recommendation. Our framework is flexible to incorporate multiple kinds of factor information, and is able to trace how each factor contributes to the recommendation decision over time.
arXiv Detail & Related papers (2020-05-21T12:28:59Z)
Nested-Wasserstein Self-Imitation Learning for Sequence Generation [158.19606942252284]
We propose the concept of nested-Wasserstein distance for distributional semantic matching. A novel nested-Wasserstein self-imitation learning framework is developed, encouraging the model to exploit historical high-rewarded sequences.
arXiv Detail & Related papers (2020-01-20T02:19:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.