Multi-Task Fusion via Reinforcement Learning for Long-Term User
Satisfaction in Recommender Systems
- URL: http://arxiv.org/abs/2208.04560v2
- Date: Wed, 10 Aug 2022 04:17:14 GMT
- Title: Multi-Task Fusion via Reinforcement Learning for Long-Term User
Satisfaction in Recommender Systems
- Authors: Qihua Zhang, Junning Liu, Yuzhuo Dai, Yiyan Qi, Yifan Yuan, Kunlun
Zheng, Fan Huang, Xianfeng Tan
- Abstract summary: We propose a Batch Reinforcement Learning based Multi-Task Fusion framework (BatchRL-MTF)
We learn an optimal recommendation policy from the fixed batch data offline for long-term user satisfaction.
With a comprehensive investigation on user behaviors, we model the user satisfaction reward with subtles from two aspects of user stickiness and user activeness.
- Score: 3.4394890850129007
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recommender System (RS) is an important online application that affects
billions of users every day. The mainstream RS ranking framework is composed of
two parts: a Multi-Task Learning model (MTL) that predicts various user
feedback, i.e., clicks, likes, sharings, and a Multi-Task Fusion model (MTF)
that combines the multi-task outputs into one final ranking score with respect
to user satisfaction. There has not been much research on the fusion model
while it has great impact on the final recommendation as the last crucial
process of the ranking. To optimize long-term user satisfaction rather than
obtain instant returns greedily, we formulate MTF task as Markov Decision
Process (MDP) within a recommendation session and propose a Batch Reinforcement
Learning (RL) based Multi-Task Fusion framework (BatchRL-MTF) that includes a
Batch RL framework and an online exploration. The former exploits Batch RL to
learn an optimal recommendation policy from the fixed batch data offline for
long-term user satisfaction, while the latter explores potential high-value
actions online to break through the local optimal dilemma. With a comprehensive
investigation on user behaviors, we model the user satisfaction reward with
subtle heuristics from two aspects of user stickiness and user activeness.
Finally, we conduct extensive experiments on a billion-sample level real-world
dataset to show the effectiveness of our model. We propose a conservative
offline policy estimator (Conservative-OPEstimator) to test our model offline.
Furthermore, we take online experiments in a real recommendation environment to
compare performance of different models. As one of few Batch RL researches
applied in MTF task successfully, our model has also been deployed on a
large-scale industrial short video platform, serving hundreds of millions of
users.
Related papers
- An Enhanced-State Reinforcement Learning Algorithm for Multi-Task Fusion in Large-Scale Recommender Systems [12.277443583840963]
We propose a novel method called Enhanced-State RL for Multi-Task Fusion (MTF) in Recommender Systems (RSs)
Our method first defines user features, item features, and other valuable features collectively as the enhanced state; then proposes a novel actor and critic learning process to utilize the enhanced state to make much better action for each user-item pair.
arXiv Detail & Related papers (2024-09-18T03:34:31Z) - An Off-Policy Reinforcement Learning Algorithm Customized for Multi-Task Fusion in Large-Scale Recommender Systems [19.443149691831856]
Multi-Task Fusion (MTF) is responsible for combining multiple scores outputted by Multi-Task Learning (MTL) into a final score to maximize user satisfaction.
Recently, to optimize long-term user satisfaction within a recommendation session, Reinforcement Learning (RL) is used for MTF in the industry.
In this paper, we propose a novel method named IntegratedRL-MTF customized for MTF in large-scale RSs.
arXiv Detail & Related papers (2024-04-19T08:43:03Z) - On Generative Agents in Recommendation [58.42840923200071]
Agent4Rec is a user simulator in recommendation based on Large Language Models.
Each agent interacts with personalized recommender models in a page-by-page manner.
arXiv Detail & Related papers (2023-10-16T06:41:16Z) - Sequential Search with Off-Policy Reinforcement Learning [48.88165680363482]
We propose a highly scalable hybrid learning model that consists of an RNN learning framework and an attention model.
As a novel optimization step, we fit multiple short user sequences in a single RNN pass within a training batch, by solving a greedy knapsack problem on the fly.
We also explore the use of off-policy reinforcement learning in multi-session personalized search ranking.
arXiv Detail & Related papers (2022-02-01T06:52:40Z) - Choosing the Best of Both Worlds: Diverse and Novel Recommendations
through Multi-Objective Reinforcement Learning [68.45370492516531]
We introduce Scalarized Multi-Objective Reinforcement Learning (SMORL) for the Recommender Systems (RS) setting.
SMORL agent augments standard recommendation models with additional RL layers that enforce it to simultaneously satisfy three principal objectives: accuracy, diversity, and novelty of recommendations.
Our experimental results on two real-world datasets reveal a substantial increase in aggregate diversity, a moderate increase in accuracy, reduced repetitiveness of recommendations, and demonstrate the importance of reinforcing diversity and novelty as complementary objectives.
arXiv Detail & Related papers (2021-10-28T13:22:45Z) - Multi-Faceted Hierarchical Multi-Task Learning for a Large Number of
Tasks with Multi-dimensional Relations [10.326429525379181]
This work studies the "macro" perspective of shared learning network design and proposes a Multi-Faceted Hierarchical MTL model(MFH)
MFH exploits the multi-dimensional task relations with a nested hierarchical tree structure which maximizes the shared learning.
We evaluate MFH and SOTA models in a large industry video platform of 10 billion samples and results show that MFH outperforms SOTA MTL models significantly in both offline and online evaluations.
arXiv Detail & Related papers (2021-10-26T02:35:51Z) - Multitask Adaptation by Retrospective Exploration with Learned World
Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage.
The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z) - Deep Latent Emotion Network for Multi-Task Learning [3.211310973369844]
We propose a Deep Latent Emotion Network (DLEN) model to extract latent probability of a user preferring a feed.
DLEN is deployed on a real-world multi-task feed recommendation scenario of Tencent QQ-Small-World with a dataset containing over a billion samples.
It exhibits a significant performance advantage over the SOTA MTL model in offline evaluation, together with a considerable increase by 3.02% in view-count and 2.63% in user stay-time in production.
arXiv Detail & Related papers (2021-04-18T04:55:13Z) - Self-Supervised Reinforcement Learning for Recommender Systems [77.38665506495553]
We propose self-supervised reinforcement learning for sequential recommendation tasks.
Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL.
Based on such an approach, we propose two frameworks namely Self-Supervised Q-learning(SQN) and Self-Supervised Actor-Critic(SAC)
arXiv Detail & Related papers (2020-06-10T11:18:57Z) - Controllable Multi-Interest Framework for Recommendation [64.30030600415654]
We formalize the recommender system as a sequential recommendation problem.
We propose a novel controllable multi-interest framework for the sequential recommendation, called ComiRec.
Our framework has been successfully deployed on the offline Alibaba distributed cloud platform.
arXiv Detail & Related papers (2020-05-19T10:18:43Z) - RNE: A Scalable Network Embedding for Billion-scale Recommendation [21.6366085346674]
We propose RNE, a data-efficient Recommendation-based Network Embedding method, to give personalized and diverse items to users.
On the one hand, the method is able to preserve the local structure between the users and items while modeling the diversity and dynamic property of the user interest to boost the recommendation quality.
We deploy RNE on a recommendation scenario of Taobao, the largest E-commerce platform in China, and train it on a billion-scale user-item graph.
arXiv Detail & Related papers (2020-03-10T07:08:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.