Related papers: AdaRec: Adaptive Sequential Recommendation for Reinforcing Long-term User Engagement

AdaRec: Adaptive Sequential Recommendation for Reinforcing Long-term User Engagement

URL: http://arxiv.org/abs/2310.03984v1
Date: Fri, 6 Oct 2023 02:45:21 GMT
Title: AdaRec: Adaptive Sequential Recommendation for Reinforcing Long-term User Engagement
Authors: Zhenghai Xue, Qingpeng Cai, Tianyou Zuo, Bin Yang, Lantao Hu, Peng Jiang, Kun Gai, Bo An
Abstract summary: We introduce a novel paradigm called Adaptive Sequential Recommendation (AdaRec) to address this issue. AdaRec proposes a new distance-based representation loss to extract latent information from users' interaction trajectories. We conduct extensive empirical analyses in both simulator-based and live sequential recommendation tasks.
Score: 25.18963930580529
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Growing attention has been paid to Reinforcement Learning (RL) algorithms when optimizing long-term user engagement in sequential recommendation tasks. One challenge in large-scale online recommendation systems is the constant and complicated changes in users' behavior patterns, such as interaction rates and retention tendencies. When formulated as a Markov Decision Process (MDP), the dynamics and reward functions of the recommendation system are continuously affected by these changes. Existing RL algorithms for recommendation systems will suffer from distribution shift and struggle to adapt in such an MDP. In this paper, we introduce a novel paradigm called Adaptive Sequential Recommendation (AdaRec) to address this issue. AdaRec proposes a new distance-based representation loss to extract latent information from users' interaction trajectories. Such information reflects how RL policy fits to current user behavior patterns, and helps the policy to identify subtle changes in the recommendation system. To make rapid adaptation to these changes, AdaRec encourages exploration with the idea of optimism under uncertainty. The exploration is further guarded by zero-order action optimization to ensure stable recommendation quality in complicated environments. We conduct extensive empirical analyses in both simulator-based and live sequential recommendation tasks, where AdaRec exhibits superior long-term performance compared to all baseline algorithms.

Related papers

Provable Zero-Shot Generalization in Offline Reinforcement Learning [55.169228792596805]
We study offline reinforcement learning with zero-shot generalization property (ZSG) Existing work showed that classical offline RL fails to generalize to new, unseen environments. We show that both PERM and PPPO are capable of finding a near-optimal policy with ZSG.
arXiv Detail & Related papers (2025-03-11T02:44:32Z)
Large Language Model driven Policy Exploration for Recommender Systems [50.70228564385797]
offline RL policies trained on static user data are vulnerable to distribution shift when deployed in dynamic online environments. Online RL-based RS also face challenges in production deployment due to the risks of exposing users to untrained or unstable policies. Large Language Models (LLMs) offer a promising solution to mimic user objectives and preferences for pre-training policies offline. We propose an Interaction-Augmented Learned Policy (iALP) that utilizes user preferences distilled from an LLM.
arXiv Detail & Related papers (2025-01-23T16:37:44Z)
Contractive Dynamical Imitation Policies for Efficient Out-of-Sample Recovery [3.549243565065057]
Imitation learning is a data-driven approach to learning policies from expert behavior. It is prone to unreliable outcomes in out-of-sample (OOS) regions. We propose a framework for learning policies modeled by contractive dynamical systems.
arXiv Detail & Related papers (2024-12-10T14:28:18Z)
Survival of the Fittest: Evolutionary Adaptation of Policies for Environmental Shifts [0.15889427269227555]
We develop an adaptive re-training algorithm inspired by evolutionary game theory (EGT) ERPO shows faster policy adaptation, higher average rewards, and reduced computational costs in policy adaptation.
arXiv Detail & Related papers (2024-10-22T09:29:53Z)
Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation [51.06031200728449]
We propose a novel framework called mccHRL to provide different levels of temporal abstraction on listwise recommendation. Within the hierarchical framework, the high-level agent studies the evolution of user perception, while the low-level agent produces the item selection policy. Results observe significant performance improvement by our method, compared with several well-known baselines.
arXiv Detail & Related papers (2024-09-11T17:01:06Z)
CSRec: Rethinking Sequential Recommendation from A Causal Perspective [25.69446083970207]
The essence of sequential recommender systems (RecSys) lies in understanding how users make decisions. We propose a novel formulation of sequential recommendation, termed Causal Sequential Recommendation (CSRec) CSRec aims to predict the probability of a recommended item's acceptance within a sequential context and backtrack how current decisions are made.
arXiv Detail & Related papers (2024-08-23T23:19:14Z)
Preference Elicitation for Offline Reinforcement Learning [59.136381500967744]
We propose Sim-OPRL, an offline preference-based reinforcement learning algorithm. Our algorithm employs a pessimistic approach for out-of-distribution data, and an optimistic approach for acquiring informative preferences about the optimal policy.
arXiv Detail & Related papers (2024-06-26T15:59:13Z)
A Conservative Approach for Few-Shot Transfer in Off-Dynamics Reinforcement Learning [3.1515473193934778]
Off-dynamics Reinforcement Learning seeks to transfer a policy from a source environment to a target environment characterized by distinct yet similar dynamics. We propose an innovative approach inspired by recent advancements in Imitation Learning and conservative RL algorithms.
arXiv Detail & Related papers (2023-12-24T13:09:08Z)
Adversarial Batch Inverse Reinforcement Learning: Learn to Reward from Imperfect Demonstration for Interactive Recommendation [23.048841953423846]
We focus on the problem of learning to reward, which is fundamental to reinforcement learning. Previous approaches either introduce additional procedures for learning to reward, thereby increasing the complexity of optimization. We propose a novel batch inverse reinforcement learning paradigm that achieves the desired properties.
arXiv Detail & Related papers (2023-10-30T13:43:20Z)
Fisher-Weighted Merge of Contrastive Learning Models in Sequential Recommendation [0.0]
We are the first to apply the Fisher-Merging method to Sequential Recommendation, addressing and resolving practical challenges associated with it. We demonstrate the effectiveness of our proposed methods, highlighting their potential to advance the state-of-the-art in sequential learning and recommendation systems.
arXiv Detail & Related papers (2023-07-05T05:58:56Z)
Generative Slate Recommendation with Reinforcement Learning [49.75985313698214]
reinforcement learning algorithms can be used to optimize user engagement in recommender systems. However, RL approaches are intractable in the slate recommendation scenario. In that setting, an action corresponds to a slate that may contain any combination of items. In this work we propose to encode slates in a continuous, low-dimensional latent space learned by a variational auto-encoder. We are able to (i) relax assumptions required by previous work, and (ii) improve the quality of the action selection by modeling full slates.
arXiv Detail & Related papers (2023-01-20T15:28:09Z)
Breaking Feedback Loops in Recommender Systems with Causal Inference [99.22185950608838]
Recent work has shown that feedback loops may compromise recommendation quality and homogenize user behavior. We propose the Causal Adjustment for Feedback Loops (CAFL), an algorithm that provably breaks feedback loops using causal inference. We show that CAFL improves recommendation quality when compared to prior correction methods.
arXiv Detail & Related papers (2022-07-04T17:58:39Z)
Off-Policy Evaluation with Policy-Dependent Optimization Response [90.28758112893054]
We develop a new framework for off-policy evaluation with a textitpolicy-dependent linear optimization response. We construct unbiased estimators for the policy-dependent estimand by a perturbation method. We provide a general algorithm for optimizing causal interventions.
arXiv Detail & Related papers (2022-02-25T20:25:37Z)
CausPref: Causal Preference Learning for Out-of-Distribution Recommendation [36.22965012642248]
The current recommender system is still vulnerable to the distribution shift of users and items in realistic scenarios. We propose to incorporate the recommendation-specific DAG learner into a novel causal preference-based recommendation framework named CausPref. Our approach surpasses the benchmark models significantly under types of out-of-distribution settings.
arXiv Detail & Related papers (2022-02-08T16:42:03Z)
D2RLIR : an improved and diversified ranking function in interactive recommendation systems based on deep reinforcement learning [0.3058685580689604]
This paper proposes a deep reinforcement learning based recommendation system by utilizing Actor-Critic architecture. The proposed model is able to generate a diverse while relevance recommendation list based on the user's preferences.
arXiv Detail & Related papers (2021-10-28T13:11:29Z)
Variance Reduction based Experience Replay for Policy Optimization [3.0790370651488983]
Variance Reduction Experience Replay (VRER) is a framework for the selective reuse of relevant samples to improve policy gradient estimation. VRER forms the foundation of our sample efficient off-policy learning algorithm known as Policy Gradient with VRER.
arXiv Detail & Related papers (2021-10-17T19:28:45Z)
Improving Long-Term Metrics in Recommendation Systems using Short-Horizon Offline RL [56.20835219296896]
We study session-based recommendation scenarios where we want to recommend items to users during sequential interactions to improve their long-term utility. We develop a new batch RL algorithm called Short Horizon Policy Improvement (SHPI) that approximates policy-induced distribution shifts across sessions.
arXiv Detail & Related papers (2021-06-01T15:58:05Z)
Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs [71.47895794305883]
We study the problem of Safe Policy Improvement (SPI) under constraints in the offline Reinforcement Learning setting. We present an SPI for this RL setting that takes into account the preferences of the algorithm's user for handling the trade-offs for different reward signals.
arXiv Detail & Related papers (2021-05-31T21:04:21Z)
Dynamic Regret of Policy Optimization in Non-stationary Environments [120.01408308460095]
We propose two model-free policy optimization algorithms, POWER and POWER++, and establish guarantees for their dynamic regret. We show that POWER++ improves over POWER on the second component of the dynamic regret by actively adapting to non-stationarity through prediction. To the best of our knowledge, our work is the first dynamic regret analysis of model-free RL algorithms in non-stationary environments.
arXiv Detail & Related papers (2020-06-30T23:34:37Z)
Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity. Our method leverages latent variable models to learn a representation of the environment from current and past experiences. We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z)
Sequential Recommendation with Self-Attentive Multi-Adversarial Network [101.25533520688654]
We present a Multi-Factor Generative Adversarial Network (MFGAN) for explicitly modeling the effect of context information on sequential recommendation. Our framework is flexible to incorporate multiple kinds of factor information, and is able to trace how each factor contributes to the recommendation decision over time.
arXiv Detail & Related papers (2020-05-21T12:28:59Z)
Reward Constrained Interactive Recommendation with Natural Language Feedback [158.8095688415973]
We propose a novel constraint-augmented reinforcement learning (RL) framework to efficiently incorporate user preferences over time. Specifically, we leverage a discriminator to detect recommendations violating user historical preference. Our proposed framework is general and is further extended to the task of constrained text generation.
arXiv Detail & Related papers (2020-05-04T16:23:34Z)
Unsupervised Domain Adaptation in Person re-ID via k-Reciprocal Clustering and Large-Scale Heterogeneous Environment Synthesis [76.46004354572956]
We introduce an unsupervised domain adaptation approach for person re-identification. Experimental results show that the proposed ktCUDA and SHRED approach achieves an average improvement of +5.7 mAP in re-identification performance.
arXiv Detail & Related papers (2020-01-14T17:43:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.