Stratified Expert Cloning with Adaptive Selection for User Retention in Large-Scale Recommender Systems
- URL: http://arxiv.org/abs/2504.05628v1
- Date: Tue, 08 Apr 2025 03:10:42 GMT
- Title: Stratified Expert Cloning with Adaptive Selection for User Retention in Large-Scale Recommender Systems
- Authors: Chengzhi Lin, Annan Xie, Shuchang Liu, Wuhong Wang, Chuyuan Wang, Yongqi Liu,
- Abstract summary: Stratified Expert Cloning (SEC) is a novel imitation learning framework that effectively leverages logged data from high-retention users to learn robust recommendation policies.<n>SEC introduces three key innovations: 1) a multi-level expert stratification strategy that captures the nuances in expert user behaviors at different retention levels; 2) an adaptive expert selection mechanism that dynamically assigns users to the most suitable policy based on their current state and historical retention level; and 3) an action entropy regularization technique that promotes recommendation diversity and mitigates the risk of policy collapse.
- Score: 2.0378554336804013
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: User retention has emerged as a critical challenge in large-scale recommender systems, significantly impacting the long-term success of online platforms. Existing methods often focus on short-term engagement metrics, failing to capture the complex dynamics of user preferences and behaviors over extended periods. While reinforcement learning (RL) approaches have shown promise in optimizing long-term rewards, they face difficulties in credit assignment, sample efficiency, and exploration when applied to the user retention problem. In this work, we propose Stratified Expert Cloning (SEC), a novel imitation learning framework that effectively leverages abundant logged data from high-retention users to learn robust recommendation policies. SEC introduces three key innovations: 1) a multi-level expert stratification strategy that captures the nuances in expert user behaviors at different retention levels; 2) an adaptive expert selection mechanism that dynamically assigns users to the most suitable policy based on their current state and historical retention level; and 3) an action entropy regularization technique that promotes recommendation diversity and mitigates the risk of policy collapse. Through extensive offline experiments and online A/B tests on two major video platforms, Kuaishou and Kuaishou Lite, with hundreds of millions of daily active users, we demonstrate SEC's significant improvements over state-of-the-art methods in user retention. The results demonstrate significant improvements in user retention, with cumulative lifts of 0.098\% and 0.122\% in active days on Kuaishou and Kuaishou Lite respectively, additionally bringing tens of thousands of daily active users to each platform.
Related papers
- Large Language Model driven Policy Exploration for Recommender Systems [50.70228564385797]
offline RL policies trained on static user data are vulnerable to distribution shift when deployed in dynamic online environments.<n>Online RL-based RS also face challenges in production deployment due to the risks of exposing users to untrained or unstable policies.<n>Large Language Models (LLMs) offer a promising solution to mimic user objectives and preferences for pre-training policies offline.<n>We propose an Interaction-Augmented Learned Policy (iALP) that utilizes user preferences distilled from an LLM.
arXiv Detail & Related papers (2025-01-23T16:37:44Z) - Multi-Objective Recommendation via Multivariate Policy Learning [10.494676556696213]
Real-world recommender systems often need to balance multiple objectives when deciding which recommendations to present to users.
These include behavioural signals (e.g. clicks, shares, dwell time), as well as broader objectives (e.g. diversity, fairness)
arXiv Detail & Related papers (2024-05-03T14:44:04Z) - Leave No One Behind: Online Self-Supervised Self-Distillation for Sequential Recommendation [20.52842524024608]
Sequential recommendation methods play a pivotal role in modern recommendation systems.
Recent methods leverage contrastive learning to derive self-supervision signals.
We introduce a novel learning paradigm, named Online Self-Supervised Self-distillation for Sequential Recommendation.
arXiv Detail & Related papers (2024-03-22T12:27:21Z) - UOEP: User-Oriented Exploration Policy for Enhancing Long-Term User Experiences in Recommender Systems [7.635117537731915]
Reinforcement learning (RL) has gained traction for enhancing user long-term experiences in recommender systems.
Modern recommender systems exhibit distinct user behavioral patterns among tens of millions of items, which increases the difficulty of exploration.
We propose User-Oriented Exploration Policy (UOEP), a novel approach facilitating fine-grained exploration among user groups.
arXiv Detail & Related papers (2024-01-17T08:01:18Z) - Optimizing Credit Limit Adjustments Under Adversarial Goals Using
Reinforcement Learning [42.303733194571905]
We seek to find and automatize an optimal credit card limit adjustment policy by employing reinforcement learning techniques.
Our research establishes a conceptual structure for applying reinforcement learning framework to credit limit adjustment.
arXiv Detail & Related papers (2023-06-27T16:10:36Z) - Reinforcing User Retention in a Billion Scale Short Video Recommender
System [21.681785801465328]
Short video platforms have achieved rapid user growth by recommending interesting content to users.
The objective of the recommendation is to optimize user retention, thereby driving the growth of DAU (Daily Active Users)
arXiv Detail & Related papers (2023-02-03T13:25:43Z) - Personalizing Intervened Network for Long-tailed Sequential User
Behavior Modeling [66.02953670238647]
Tail users suffer from significantly lower-quality recommendation than the head users after joint training.
A model trained on tail users separately still achieve inferior results due to limited data.
We propose a novel approach that significantly improves the recommendation performance of the tail users.
arXiv Detail & Related papers (2022-08-19T02:50:19Z) - Data augmentation for efficient learning from parametric experts [88.33380893179697]
We focus on what we call the policy cloning setting, in which we use online or offline queries of an expert to inform the behavior of a student policy.
Our approach, augmented policy cloning (APC), uses synthetic states to induce feedback-sensitivity in a region around sampled trajectories.
We achieve highly data-efficient transfer of behavior from an expert to a student policy for high-degrees-of-freedom control problems.
arXiv Detail & Related papers (2022-05-23T16:37:16Z) - SURF: Semi-supervised Reward Learning with Data Augmentation for
Feedback-efficient Preference-based Reinforcement Learning [168.89470249446023]
We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation.
In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor.
Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
arXiv Detail & Related papers (2022-03-18T16:50:38Z) - Improving Long-Term Metrics in Recommendation Systems using
Short-Horizon Offline RL [56.20835219296896]
We study session-based recommendation scenarios where we want to recommend items to users during sequential interactions to improve their long-term utility.
We develop a new batch RL algorithm called Short Horizon Policy Improvement (SHPI) that approximates policy-induced distribution shifts across sessions.
arXiv Detail & Related papers (2021-06-01T15:58:05Z) - Maximizing Cumulative User Engagement in Sequential Recommendation: An
Online Optimization Perspective [26.18096797120916]
It is often needed to tradeoff two potentially conflicting objectives, that is, pursuing higher immediate user engagement and encouraging user browsing.
We propose a flexible and practical framework to explicitly tradeoff longer user browsing length and high immediate user engagement.
This approach is deployed at a large E-commerce platform, achieved over 7% improvement of cumulative clicks.
arXiv Detail & Related papers (2020-06-02T09:02:51Z) - Empowering Active Learning to Jointly Optimize System and User Demands [70.66168547821019]
We propose a new active learning approach that jointly optimize the active learning system (training efficiently) and the user (receiving useful instances)
We study our approach in an educational application, which particularly benefits from this technique as the system needs to rapidly learn to predict the appropriateness of an exercise to a particular user.
We evaluate multiple learning strategies and user types with data from real users and find that our joint approach better satisfies both objectives when alternative methods lead to many unsuitable exercises for end users.
arXiv Detail & Related papers (2020-05-09T16:02:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.