Reinforcing User Retention in a Billion Scale Short Video Recommender
System
- URL: http://arxiv.org/abs/2302.01724v2
- Date: Tue, 7 Feb 2023 04:12:02 GMT
- Title: Reinforcing User Retention in a Billion Scale Short Video Recommender
System
- Authors: Qingpeng Cai, Shuchang Liu, Xueliang Wang, Tianyou Zuo, Wentao Xie,
Bin Yang, Dong Zheng, Peng Jiang, Kun Gai
- Abstract summary: Short video platforms have achieved rapid user growth by recommending interesting content to users.
The objective of the recommendation is to optimize user retention, thereby driving the growth of DAU (Daily Active Users)
- Score: 21.681785801465328
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Recently, short video platforms have achieved rapid user growth by
recommending interesting content to users. The objective of the recommendation
is to optimize user retention, thereby driving the growth of DAU (Daily Active
Users). Retention is a long-term feedback after multiple interactions of users
and the system, and it is hard to decompose retention reward to each item or a
list of items. Thus traditional point-wise and list-wise models are not able to
optimize retention. In this paper, we choose reinforcement learning methods to
optimize the retention as they are designed to maximize the long-term
performance. We formulate the problem as an infinite-horizon request-based
Markov Decision Process, and our objective is to minimize the accumulated time
interval of multiple sessions, which is equal to improving the app open
frequency and user retention. However, current reinforcement learning
algorithms can not be directly applied in this setting due to uncertainty,
bias, and long delay time incurred by the properties of user retention. We
propose a novel method, dubbed RLUR, to address the aforementioned challenges.
Both offline and live experiments show that RLUR can significantly improve user
retention. RLUR has been fully launched in Kuaishou app for a long time, and
achieves consistent performance improvement on user retention and DAU.
Related papers
- AdaRec: Adaptive Sequential Recommendation for Reinforcing Long-term
User Engagement [25.18963930580529]
We introduce a novel paradigm called Adaptive Sequential Recommendation (AdaRec) to address this issue.
AdaRec proposes a new distance-based representation loss to extract latent information from users' interaction trajectories.
We conduct extensive empirical analyses in both simulator-based and live sequential recommendation tasks.
arXiv Detail & Related papers (2023-10-06T02:45:21Z) - PrefRec: Recommender Systems with Human Preferences for Reinforcing
Long-term User Engagement [36.95056214316392]
We propose a novel paradigm, recommender systems with human preferences (or Preference-based Recommender systems)
With PrefRec, we can fully exploit the advantages of RL in optimizing long-term goals, while avoiding complex reward engineering.
arXiv Detail & Related papers (2022-12-06T06:21:17Z) - Improving information retention in large scale online continual learning [99.73847522194549]
Online continual learning aims to adapt efficiently to new data while retaining existing knowledge.
Recent work suggests that information retention remains a problem in large scale OCL even when the replay buffer is unlimited.
We propose using a moving average family of methods to improve optimization for non-stationary objectives.
arXiv Detail & Related papers (2022-10-12T16:59:43Z) - Sequential Search with Off-Policy Reinforcement Learning [48.88165680363482]
We propose a highly scalable hybrid learning model that consists of an RNN learning framework and an attention model.
As a novel optimization step, we fit multiple short user sequences in a single RNN pass within a training batch, by solving a greedy knapsack problem on the fly.
We also explore the use of off-policy reinforcement learning in multi-session personalized search ranking.
arXiv Detail & Related papers (2022-02-01T06:52:40Z) - Denoising User-aware Memory Network for Recommendation [11.145186013006375]
We propose a novel CTR model named denoising user-aware memory network (DUMN)
DUMN uses the representation of explicit feedback to purify the representation of implicit feedback, and effectively denoise the implicit feedback.
Experiments on two real e-commerce user behavior datasets show that DUMN has a significant improvement over the state-of-the-art baselines.
arXiv Detail & Related papers (2021-07-12T14:39:36Z) - Improving Long-Term Metrics in Recommendation Systems using
Short-Horizon Offline RL [56.20835219296896]
We study session-based recommendation scenarios where we want to recommend items to users during sequential interactions to improve their long-term utility.
We develop a new batch RL algorithm called Short Horizon Policy Improvement (SHPI) that approximates policy-induced distribution shifts across sessions.
arXiv Detail & Related papers (2021-06-01T15:58:05Z) - Dynamic Memory based Attention Network for Sequential Recommendation [79.5901228623551]
We propose a novel long sequential recommendation model called Dynamic Memory-based Attention Network (DMAN)
It segments the overall long behavior sequence into a series of sub-sequences, then trains the model and maintains a set of memory blocks to preserve long-term interests of users.
Based on the dynamic memory, the user's short-term and long-term interests can be explicitly extracted and combined for efficient joint recommendation.
arXiv Detail & Related papers (2021-02-18T11:08:54Z) - Sequential Recommender via Time-aware Attentive Memory Network [67.26862011527986]
We propose a temporal gating methodology to improve attention mechanism and recurrent units.
We also propose a Multi-hop Time-aware Attentive Memory network to integrate long-term and short-term preferences.
Our approach is scalable for candidate retrieval tasks and can be viewed as a non-linear generalization of latent factorization for dot-product based Top-K recommendation.
arXiv Detail & Related papers (2020-05-18T11:29:38Z) - Reward Constrained Interactive Recommendation with Natural Language
Feedback [158.8095688415973]
We propose a novel constraint-augmented reinforcement learning (RL) framework to efficiently incorporate user preferences over time.
Specifically, we leverage a discriminator to detect recommendations violating user historical preference.
Our proposed framework is general and is further extended to the task of constrained text generation.
arXiv Detail & Related papers (2020-05-04T16:23:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.