Reinforcing User Retention in a Billion Scale Short Video Recommender
System
- URL: http://arxiv.org/abs/2302.01724v2
- Date: Tue, 7 Feb 2023 04:12:02 GMT
- Title: Reinforcing User Retention in a Billion Scale Short Video Recommender
System
- Authors: Qingpeng Cai, Shuchang Liu, Xueliang Wang, Tianyou Zuo, Wentao Xie,
Bin Yang, Dong Zheng, Peng Jiang, Kun Gai
- Abstract summary: Short video platforms have achieved rapid user growth by recommending interesting content to users.
The objective of the recommendation is to optimize user retention, thereby driving the growth of DAU (Daily Active Users)
- Score: 21.681785801465328
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Recently, short video platforms have achieved rapid user growth by
recommending interesting content to users. The objective of the recommendation
is to optimize user retention, thereby driving the growth of DAU (Daily Active
Users). Retention is a long-term feedback after multiple interactions of users
and the system, and it is hard to decompose retention reward to each item or a
list of items. Thus traditional point-wise and list-wise models are not able to
optimize retention. In this paper, we choose reinforcement learning methods to
optimize the retention as they are designed to maximize the long-term
performance. We formulate the problem as an infinite-horizon request-based
Markov Decision Process, and our objective is to minimize the accumulated time
interval of multiple sessions, which is equal to improving the app open
frequency and user retention. However, current reinforcement learning
algorithms can not be directly applied in this setting due to uncertainty,
bias, and long delay time incurred by the properties of user retention. We
propose a novel method, dubbed RLUR, to address the aforementioned challenges.
Both offline and live experiments show that RLUR can significantly improve user
retention. RLUR has been fully launched in Kuaishou app for a long time, and
achieves consistent performance improvement on user retention and DAU.
Related papers
- Unveiling User Satisfaction and Creator Productivity Trade-Offs in Recommendation Platforms [68.51708490104687]
We show that a purely relevance-driven policy with low exploration strength boosts short-term user satisfaction but undermines the long-term richness of the content pool.
Our findings reveal a fundamental trade-off between immediate user satisfaction and overall content production on platforms.
arXiv Detail & Related papers (2024-10-31T07:19:22Z) - Prompt Tuning as User Inherent Profile Inference Machine [53.78398656789463]
We propose UserIP-Tuning, which uses prompt-tuning to infer user profiles.
A profile quantization codebook bridges the modality gap by profile embeddings into collaborative IDs.
Experiments on four public datasets show that UserIP-Tuning outperforms state-of-the-art recommendation algorithms.
arXiv Detail & Related papers (2024-08-13T02:25:46Z) - Modeling User Retention through Generative Flow Networks [34.74982897470852]
Flow-based modeling technique can back-propagate the retention reward towards each recommended item in the user session.
We show that the flow combined with traditional learning-to-rank objectives eventually optimized a non-discounted cumulative reward for both immediate user feedback and user retention.
arXiv Detail & Related papers (2024-06-10T06:22:18Z) - PrefRec: Recommender Systems with Human Preferences for Reinforcing
Long-term User Engagement [36.95056214316392]
We propose a novel paradigm, recommender systems with human preferences (or Preference-based Recommender systems)
With PrefRec, we can fully exploit the advantages of RL in optimizing long-term goals, while avoiding complex reward engineering.
arXiv Detail & Related papers (2022-12-06T06:21:17Z) - Improving information retention in large scale online continual learning [99.73847522194549]
Online continual learning aims to adapt efficiently to new data while retaining existing knowledge.
Recent work suggests that information retention remains a problem in large scale OCL even when the replay buffer is unlimited.
We propose using a moving average family of methods to improve optimization for non-stationary objectives.
arXiv Detail & Related papers (2022-10-12T16:59:43Z) - Sequential Search with Off-Policy Reinforcement Learning [48.88165680363482]
We propose a highly scalable hybrid learning model that consists of an RNN learning framework and an attention model.
As a novel optimization step, we fit multiple short user sequences in a single RNN pass within a training batch, by solving a greedy knapsack problem on the fly.
We also explore the use of off-policy reinforcement learning in multi-session personalized search ranking.
arXiv Detail & Related papers (2022-02-01T06:52:40Z) - Denoising User-aware Memory Network for Recommendation [11.145186013006375]
We propose a novel CTR model named denoising user-aware memory network (DUMN)
DUMN uses the representation of explicit feedback to purify the representation of implicit feedback, and effectively denoise the implicit feedback.
Experiments on two real e-commerce user behavior datasets show that DUMN has a significant improvement over the state-of-the-art baselines.
arXiv Detail & Related papers (2021-07-12T14:39:36Z) - Improving Long-Term Metrics in Recommendation Systems using
Short-Horizon Offline RL [56.20835219296896]
We study session-based recommendation scenarios where we want to recommend items to users during sequential interactions to improve their long-term utility.
We develop a new batch RL algorithm called Short Horizon Policy Improvement (SHPI) that approximates policy-induced distribution shifts across sessions.
arXiv Detail & Related papers (2021-06-01T15:58:05Z) - Sequential Recommender via Time-aware Attentive Memory Network [67.26862011527986]
We propose a temporal gating methodology to improve attention mechanism and recurrent units.
We also propose a Multi-hop Time-aware Attentive Memory network to integrate long-term and short-term preferences.
Our approach is scalable for candidate retrieval tasks and can be viewed as a non-linear generalization of latent factorization for dot-product based Top-K recommendation.
arXiv Detail & Related papers (2020-05-18T11:29:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.