ResAct: Reinforcing Long-term Engagement in Sequential Recommendation
with Residual Actor
- URL: http://arxiv.org/abs/2206.02620v2
- Date: Fri, 16 Jun 2023 08:38:52 GMT
- Title: ResAct: Reinforcing Long-term Engagement in Sequential Recommendation
with Residual Actor
- Authors: Wanqi Xue, Qingpeng Cai, Ruohan Zhan, Dong Zheng, Peng Jiang, Kun Gai,
Bo An
- Abstract summary: ResAct seeks a policy that is close to, but better than, the online-serving policy.
We conduct experiments on a benchmark dataset and a large-scale industrial dataset.
Results show that our method significantly outperforms the state-of-the-art baselines in various long-term engagement optimization tasks.
- Score: 36.0251263322305
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Long-term engagement is preferred over immediate engagement in sequential
recommendation as it directly affects product operational metrics such as daily
active users (DAUs) and dwell time. Meanwhile, reinforcement learning (RL) is
widely regarded as a promising framework for optimizing long-term engagement in
sequential recommendation. However, due to expensive online interactions, it is
very difficult for RL algorithms to perform state-action value estimation,
exploration and feature extraction when optimizing long-term engagement. In
this paper, we propose ResAct which seeks a policy that is close to, but better
than, the online-serving policy. In this way, we can collect sufficient data
near the learned policy so that state-action values can be properly estimated,
and there is no need to perform online exploration. ResAct optimizes the policy
by first reconstructing the online behaviors and then improving it via a
Residual Actor. To extract long-term information, ResAct utilizes two
information-theoretical regularizers to confirm the expressiveness and
conciseness of features. We conduct experiments on a benchmark dataset and a
large-scale industrial dataset which consists of tens of millions of
recommendation requests. Experimental results show that our method
significantly outperforms the state-of-the-art baselines in various long-term
engagement optimization tasks.
Related papers
- Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation [51.06031200728449]
We propose a novel framework called mccHRL to provide different levels of temporal abstraction on listwise recommendation.
Within the hierarchical framework, the high-level agent studies the evolution of user perception, while the low-level agent produces the item selection policy.
Results observe significant performance improvement by our method, compared with several well-known baselines.
arXiv Detail & Related papers (2024-09-11T17:01:06Z) - An Efficient Continuous Control Perspective for Reinforcement-Learning-based Sequential Recommendation [14.506332665769746]
We propose an underlinetextbfEfficient underlinetextbfContinuous underlinetextbfControl framework (ECoC)
Based on a statistically tested assumption, we first propose the novel unified action representation abstracted from normalized user and item spaces.
During this process, strategic exploration and directional control in terms of unified actions are carefully designed and crucial to final recommendation decisions.
arXiv Detail & Related papers (2024-08-15T09:26:26Z) - Learning Goal-Conditioned Policies from Sub-Optimal Offline Data via Metric Learning [22.174803826742963]
We address the problem of learning optimal behavior from sub-optimal datasets for goal-conditioned offline reinforcement learning.
We propose the use of metric learning to approximate the optimal value function for goal-conditioned offline RL problems.
We show that our method estimates optimal behaviors from severely sub-optimal offline datasets without suffering from out-of-distribution estimation errors.
arXiv Detail & Related papers (2024-02-16T16:46:53Z) - AdaRec: Adaptive Sequential Recommendation for Reinforcing Long-term
User Engagement [25.18963930580529]
We introduce a novel paradigm called Adaptive Sequential Recommendation (AdaRec) to address this issue.
AdaRec proposes a new distance-based representation loss to extract latent information from users' interaction trajectories.
We conduct extensive empirical analyses in both simulator-based and live sequential recommendation tasks.
arXiv Detail & Related papers (2023-10-06T02:45:21Z) - Recommending the optimal policy by learning to act from temporal data [2.554326189662943]
This paper proposes an AI based approach that learns, by means of Reinforcement (RL)
The approach is validated on real and synthetic datasets and compared with off-policy Deep RL approaches.
The ability of our approach to compare with, and often overcome, Deep RL approaches provides a contribution towards the exploitation of white box RL techniques in scenarios where only temporal execution data are available.
arXiv Detail & Related papers (2023-03-16T10:30:36Z) - Efficient Online Reinforcement Learning with Offline Data [78.92501185886569]
We show that we can simply apply existing off-policy methods to leverage offline data when learning online.
We extensively ablate these design choices, demonstrating the key factors that most affect performance.
We see that correct application of these simple recommendations can provide a $mathbf2.5times$ improvement over existing approaches.
arXiv Detail & Related papers (2023-02-06T17:30:22Z) - Improving Long-Term Metrics in Recommendation Systems using
Short-Horizon Offline RL [56.20835219296896]
We study session-based recommendation scenarios where we want to recommend items to users during sequential interactions to improve their long-term utility.
We develop a new batch RL algorithm called Short Horizon Policy Improvement (SHPI) that approximates policy-induced distribution shifts across sessions.
arXiv Detail & Related papers (2021-06-01T15:58:05Z) - Benchmarks for Deep Off-Policy Evaluation [152.28569758144022]
We present a collection of policies that can be used for benchmarking off-policy evaluation.
The goal of our benchmark is to provide a standardized measure of progress that is motivated from a set of principles.
We provide open-source access to our data and code to foster future research in this area.
arXiv Detail & Related papers (2021-03-30T18:09:33Z) - Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR)
We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.