Related papers: Reinforcing User Retention in a Billion Scale Short Video Recommender System

Reinforcing User Retention in a Billion Scale Short Video Recommender System

URL: http://arxiv.org/abs/2302.01724v2
Date: Tue, 7 Feb 2023 04:12:02 GMT
Title: Reinforcing User Retention in a Billion Scale Short Video Recommender System
Authors: Qingpeng Cai, Shuchang Liu, Xueliang Wang, Tianyou Zuo, Wentao Xie, Bin Yang, Dong Zheng, Peng Jiang, Kun Gai
Abstract summary: Short video platforms have achieved rapid user growth by recommending interesting content to users. The objective of the recommendation is to optimize user retention, thereby driving the growth of DAU (Daily Active Users)
Score: 21.681785801465328
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Recently, short video platforms have achieved rapid user growth by recommending interesting content to users. The objective of the recommendation is to optimize user retention, thereby driving the growth of DAU (Daily Active Users). Retention is a long-term feedback after multiple interactions of users and the system, and it is hard to decompose retention reward to each item or a list of items. Thus traditional point-wise and list-wise models are not able to optimize retention. In this paper, we choose reinforcement learning methods to optimize the retention as they are designed to maximize the long-term performance. We formulate the problem as an infinite-horizon request-based Markov Decision Process, and our objective is to minimize the accumulated time interval of multiple sessions, which is equal to improving the app open frequency and user retention. However, current reinforcement learning algorithms can not be directly applied in this setting due to uncertainty, bias, and long delay time incurred by the properties of user retention. We propose a novel method, dubbed RLUR, to address the aforementioned challenges. Both offline and live experiments show that RLUR can significantly improve user retention. RLUR has been fully launched in Kuaishou app for a long time, and achieves consistent performance improvement on user retention and DAU.

Related papers

Stratified Expert Cloning with Adaptive Selection for User Retention in Large-Scale Recommender Systems [2.0378554336804013]
Stratified Expert Cloning (SEC) is a novel imitation learning framework that effectively leverages logged data from high-retention users to learn robust recommendation policies. SEC introduces three key innovations: 1) a multi-level expert stratification strategy that captures the nuances in expert user behaviors at different retention levels; 2) an adaptive expert selection mechanism that dynamically assigns users to the most suitable policy based on their current state and historical retention level; and 3) an action entropy regularization technique that promotes recommendation diversity and mitigates the risk of policy collapse.
arXiv Detail & Related papers (2025-04-08T03:10:42Z)
Unveiling User Satisfaction and Creator Productivity Trade-Offs in Recommendation Platforms [68.51708490104687]
We show that a purely relevance-driven policy with low exploration strength boosts short-term user satisfaction but undermines the long-term richness of the content pool. Our findings reveal a fundamental trade-off between immediate user satisfaction and overall content production on platforms.
arXiv Detail & Related papers (2024-10-31T07:19:22Z)
Prompt Tuning as User Inherent Profile Inference Machine [53.78398656789463]
We propose UserIP-Tuning, which uses prompt-tuning to infer user profiles. A profile quantization codebook bridges the modality gap by profile embeddings into collaborative IDs. Experiments on four public datasets show that UserIP-Tuning outperforms state-of-the-art recommendation algorithms.
arXiv Detail & Related papers (2024-08-13T02:25:46Z)
Modeling User Retention through Generative Flow Networks [34.74982897470852]
Flow-based modeling technique can back-propagate the retention reward towards each recommended item in the user session. We show that the flow combined with traditional learning-to-rank objectives eventually optimized a non-discounted cumulative reward for both immediate user feedback and user retention.
arXiv Detail & Related papers (2024-06-10T06:22:18Z)
PrefRec: Recommender Systems with Human Preferences for Reinforcing Long-term User Engagement [36.95056214316392]
We propose a novel paradigm, recommender systems with human preferences (or Preference-based Recommender systems) With PrefRec, we can fully exploit the advantages of RL in optimizing long-term goals, while avoiding complex reward engineering.
arXiv Detail & Related papers (2022-12-06T06:21:17Z)
Improving information retention in large scale online continual learning [99.73847522194549]
Online continual learning aims to adapt efficiently to new data while retaining existing knowledge. Recent work suggests that information retention remains a problem in large scale OCL even when the replay buffer is unlimited. We propose using a moving average family of methods to improve optimization for non-stationary objectives.
arXiv Detail & Related papers (2022-10-12T16:59:43Z)
Sequential Search with Off-Policy Reinforcement Learning [48.88165680363482]
We propose a highly scalable hybrid learning model that consists of an RNN learning framework and an attention model. As a novel optimization step, we fit multiple short user sequences in a single RNN pass within a training batch, by solving a greedy knapsack problem on the fly. We also explore the use of off-policy reinforcement learning in multi-session personalized search ranking.
arXiv Detail & Related papers (2022-02-01T06:52:40Z)
Denoising User-aware Memory Network for Recommendation [11.145186013006375]
We propose a novel CTR model named denoising user-aware memory network (DUMN) DUMN uses the representation of explicit feedback to purify the representation of implicit feedback, and effectively denoise the implicit feedback. Experiments on two real e-commerce user behavior datasets show that DUMN has a significant improvement over the state-of-the-art baselines.
arXiv Detail & Related papers (2021-07-12T14:39:36Z)
Improving Long-Term Metrics in Recommendation Systems using Short-Horizon Offline RL [56.20835219296896]
We study session-based recommendation scenarios where we want to recommend items to users during sequential interactions to improve their long-term utility. We develop a new batch RL algorithm called Short Horizon Policy Improvement (SHPI) that approximates policy-induced distribution shifts across sessions.
arXiv Detail & Related papers (2021-06-01T15:58:05Z)
Sequential Recommender via Time-aware Attentive Memory Network [67.26862011527986]
We propose a temporal gating methodology to improve attention mechanism and recurrent units. We also propose a Multi-hop Time-aware Attentive Memory network to integrate long-term and short-term preferences. Our approach is scalable for candidate retrieval tasks and can be viewed as a non-linear generalization of latent factorization for dot-product based Top-K recommendation.
arXiv Detail & Related papers (2020-05-18T11:29:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.