Related papers: Multi-Task Fusion via Reinforcement Learning for Long-Term User Satisfaction in Recommender Systems

Multi-Task Fusion via Reinforcement Learning for Long-Term User Satisfaction in Recommender Systems

URL: http://arxiv.org/abs/2208.04560v2
Date: Wed, 10 Aug 2022 04:17:14 GMT
Title: Multi-Task Fusion via Reinforcement Learning for Long-Term User Satisfaction in Recommender Systems
Authors: Qihua Zhang, Junning Liu, Yuzhuo Dai, Yiyan Qi, Yifan Yuan, Kunlun Zheng, Fan Huang, Xianfeng Tan
Abstract summary: We propose a Batch Reinforcement Learning based Multi-Task Fusion framework (BatchRL-MTF) We learn an optimal recommendation policy from the fixed batch data offline for long-term user satisfaction. With a comprehensive investigation on user behaviors, we model the user satisfaction reward with subtles from two aspects of user stickiness and user activeness.
Score: 3.4394890850129007
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recommender System (RS) is an important online application that affects billions of users every day. The mainstream RS ranking framework is composed of two parts: a Multi-Task Learning model (MTL) that predicts various user feedback, i.e., clicks, likes, sharings, and a Multi-Task Fusion model (MTF) that combines the multi-task outputs into one final ranking score with respect to user satisfaction. There has not been much research on the fusion model while it has great impact on the final recommendation as the last crucial process of the ranking. To optimize long-term user satisfaction rather than obtain instant returns greedily, we formulate MTF task as Markov Decision Process (MDP) within a recommendation session and propose a Batch Reinforcement Learning (RL) based Multi-Task Fusion framework (BatchRL-MTF) that includes a Batch RL framework and an online exploration. The former exploits Batch RL to learn an optimal recommendation policy from the fixed batch data offline for long-term user satisfaction, while the latter explores potential high-value actions online to break through the local optimal dilemma. With a comprehensive investigation on user behaviors, we model the user satisfaction reward with subtle heuristics from two aspects of user stickiness and user activeness. Finally, we conduct extensive experiments on a billion-sample level real-world dataset to show the effectiveness of our model. We propose a conservative offline policy estimator (Conservative-OPEstimator) to test our model offline. Furthermore, we take online experiments in a real recommendation environment to compare performance of different models. As one of few Batch RL researches applied in MTF task successfully, our model has also been deployed on a large-scale industrial short video platform, serving hundreds of millions of users.

Related papers

Feature Fusion Revisited: Multimodal CTR Prediction for MMCTR Challenge [4.3058911704400415]
The EReL@MIR workshop provided a valuable opportunity to experiment with various approaches aimed at improving the efficiency of multimodal representation learning. Our team was honored to receive the award for Task 2 - Winner (Multimodal CTR Prediction)
arXiv Detail & Related papers (2025-04-26T16:04:33Z)
xMTF: A Formula-Free Model for Reinforcement-Learning-Based Multi-Task Fusion in Recommender Systems [9.531326558213276]
A recommender system handling multiple types of feedback has two components: a multi-task learning (MTL) module, predicting feedback such as click-through rate and like rate; and a multi-task fusion (MTF) module, integrating these predictions into a single score for item ranking. In this paper, we propose a formula-free MTF framework and introduce a novel learnable monotonic fusion cell (MFC) to replace pre-defined formulas. We demonstrate that any suitable fusion function can be expressed as a composition of single-variable monotonic functions, as per the Sprecher Representation Theorem
arXiv Detail & Related papers (2025-04-08T04:28:22Z)
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning [76.35753243272521]
We introduce VisualPRM, which improves the reasoning abilities of existing Multimodal Large Language Models (MLLMs) Our model achieves a 5.9-point improvement across seven multimodal reasoning benchmarks. For the evaluation of multimodal PRMs, we propose VisualProcessBench, a benchmark with human-annotated step-wise correctness labels.
arXiv Detail & Related papers (2025-03-13T12:03:37Z)
Residual Multi-Task Learner for Applied Ranking [11.774841918446137]
ResFlow is a lightweight multi-task learning framework that enables efficient cross-task information sharing. It is fully deployed in the pre-rank module of Shopee Search.
arXiv Detail & Related papers (2024-10-30T06:49:45Z)
An Enhanced-State Reinforcement Learning Algorithm for Multi-Task Fusion in Large-Scale Recommender Systems [12.277443583840963]
We propose a novel method called Enhanced-State RL for Multi-Task Fusion (MTF) in Recommender Systems (RSs) Our method first defines user features, item features, and other valuable features collectively as the enhanced state; then proposes a novel actor and critic learning process to utilize the enhanced state to make much better action for each user-item pair.
arXiv Detail & Related papers (2024-09-18T03:34:31Z)
Personalized Multi-task Training for Recommender System [80.23030752707916]
PMTRec is the first personalized multi-task learning algorithm to obtain comprehensive user/item embeddings from various information sources. Our contributions open new avenues for advancing personalized multi-task training in recommender systems.
arXiv Detail & Related papers (2024-07-31T06:27:06Z)
An Off-Policy Reinforcement Learning Algorithm Customized for Multi-Task Fusion in Large-Scale Recommender Systems [19.443149691831856]
Multi-Task Fusion (MTF) is responsible for combining multiple scores outputted by Multi-Task Learning (MTL) into a final score to maximize user satisfaction. Recently, to optimize long-term user satisfaction within a recommendation session, Reinforcement Learning (RL) is used for MTF in the industry. In this paper, we propose a novel method named IntegratedRL-MTF customized for MTF in large-scale RSs.
arXiv Detail & Related papers (2024-04-19T08:43:03Z)
On Generative Agents in Recommendation [58.42840923200071]
Agent4Rec is a user simulator in recommendation based on Large Language Models. Each agent interacts with personalized recommender models in a page-by-page manner.
arXiv Detail & Related papers (2023-10-16T06:41:16Z)
Sequential Search with Off-Policy Reinforcement Learning [48.88165680363482]
We propose a highly scalable hybrid learning model that consists of an RNN learning framework and an attention model. As a novel optimization step, we fit multiple short user sequences in a single RNN pass within a training batch, by solving a greedy knapsack problem on the fly. We also explore the use of off-policy reinforcement learning in multi-session personalized search ranking.
arXiv Detail & Related papers (2022-02-01T06:52:40Z)
Choosing the Best of Both Worlds: Diverse and Novel Recommendations through Multi-Objective Reinforcement Learning [68.45370492516531]
We introduce Scalarized Multi-Objective Reinforcement Learning (SMORL) for the Recommender Systems (RS) setting. SMORL agent augments standard recommendation models with additional RL layers that enforce it to simultaneously satisfy three principal objectives: accuracy, diversity, and novelty of recommendations. Our experimental results on two real-world datasets reveal a substantial increase in aggregate diversity, a moderate increase in accuracy, reduced repetitiveness of recommendations, and demonstrate the importance of reinforcing diversity and novelty as complementary objectives.
arXiv Detail & Related papers (2021-10-28T13:22:45Z)
Multi-Faceted Hierarchical Multi-Task Learning for a Large Number of Tasks with Multi-dimensional Relations [10.326429525379181]
This work studies the "macro" perspective of shared learning network design and proposes a Multi-Faceted Hierarchical MTL model(MFH) MFH exploits the multi-dimensional task relations with a nested hierarchical tree structure which maximizes the shared learning. We evaluate MFH and SOTA models in a large industry video platform of 10 billion samples and results show that MFH outperforms SOTA MTL models significantly in both offline and online evaluations.
arXiv Detail & Related papers (2021-10-26T02:35:51Z)
Self-Supervised Reinforcement Learning for Recommender Systems [77.38665506495553]
We propose self-supervised reinforcement learning for sequential recommendation tasks. Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL. Based on such an approach, we propose two frameworks namely Self-Supervised Q-learning(SQN) and Self-Supervised Actor-Critic(SAC)
arXiv Detail & Related papers (2020-06-10T11:18:57Z)
Controllable Multi-Interest Framework for Recommendation [64.30030600415654]
We formalize the recommender system as a sequential recommendation problem. We propose a novel controllable multi-interest framework for the sequential recommendation, called ComiRec. Our framework has been successfully deployed on the offline Alibaba distributed cloud platform.
arXiv Detail & Related papers (2020-05-19T10:18:43Z)
RNE: A Scalable Network Embedding for Billion-scale Recommendation [21.6366085346674]
We propose RNE, a data-efficient Recommendation-based Network Embedding method, to give personalized and diverse items to users. On the one hand, the method is able to preserve the local structure between the users and items while modeling the diversity and dynamic property of the user interest to boost the recommendation quality. We deploy RNE on a recommendation scenario of Taobao, the largest E-commerce platform in China, and train it on a billion-scale user-item graph.
arXiv Detail & Related papers (2020-03-10T07:08:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.