Multi-task Offline Reinforcement Learning for Online Advertising in Recommender Systems
- URL: http://arxiv.org/abs/2506.23090v2
- Date: Wed, 09 Jul 2025 09:50:43 GMT
- Title: Multi-task Offline Reinforcement Learning for Online Advertising in Recommender Systems
- Authors: Langming Liu, Wanyu Wang, Chi Zhang, Bo Li, Hongzhi Yin, Xuetao Wei, Wenbo Su, Bo Zheng, Xiangyu Zhao,
- Abstract summary: Current offline reinforcement learning (RL) methods face substantial challenges when applied to sparse advertising scenarios.<n>We propose MTORL, a novel multi-task offline RL model that targets two key objectives.<n>We employ multi-task learning to decode actions and rewards, simultaneously addressing channel recommendation and budget allocation.
- Score: 54.709976343045824
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Online advertising in recommendation platforms has gained significant attention, with a predominant focus on channel recommendation and budget allocation strategies. However, current offline reinforcement learning (RL) methods face substantial challenges when applied to sparse advertising scenarios, primarily due to severe overestimation, distributional shifts, and overlooking budget constraints. To address these issues, we propose MTORL, a novel multi-task offline RL model that targets two key objectives. First, we establish a Markov Decision Process (MDP) framework specific to the nuances of advertising. Then, we develop a causal state encoder to capture dynamic user interests and temporal dependencies, facilitating offline RL through conditional sequence modeling. Causal attention mechanisms are introduced to enhance user sequence representations by identifying correlations among causal states. We employ multi-task learning to decode actions and rewards, simultaneously addressing channel recommendation and budget allocation. Notably, our framework includes an automated system for integrating these tasks into online advertising. Extensive experiments on offline and online environments demonstrate MTORL's superiority over state-of-the-art methods.
Related papers
- EGA-V1: Unifying Online Advertising with End-to-End Learning [17.943921299281207]
We present EGA-V1, an end-to-end generative architecture that unifies online advertising ranking as one model.<n>EGA-V1 replaces cascaded stages with a single model to directly generate optimal ad sequences from the full candidate ad corpus.
arXiv Detail & Related papers (2025-05-26T09:33:54Z) - Offline Multi-agent Reinforcement Learning via Score Decomposition [51.23590397383217]
offline cooperative multi-agent reinforcement learning (MARL) faces unique challenges due to distributional shifts.<n>This work is the first work to explicitly address the distributional gap between offline and online MARL.
arXiv Detail & Related papers (2025-05-09T11:42:31Z) - Self-Regulation and Requesting Interventions [63.5863047447313]
We propose an offline framework that trains a "helper" policy to request interventions.<n>We score optimal intervention timing with PRMs and train the helper model on these labeled trajectories.<n>This offline approach significantly reduces costly intervention calls during training.
arXiv Detail & Related papers (2025-02-07T00:06:17Z) - An Offline Multi-Agent Reinforcement Learning Framework for Radio Resource Management [5.771885923067511]
offline multi-agent reinforcement learning (MARL) addresses key limitations of online MARL.<n>We propose an offline MARL algorithm for radio resource management (RRM)<n>We evaluate three training paradigms: centralized, independent, and centralized training with decentralized execution (CTDE)
arXiv Detail & Related papers (2025-01-22T16:25:46Z) - Offline Multitask Representation Learning for Reinforcement Learning [86.26066704016056]
We study offline multitask representation learning in reinforcement learning (RL)
We propose a new algorithm called MORL for offline multitask representation learning.
Our theoretical results demonstrate the benefits of using the learned representation from the upstream offline task instead of directly learning the representation of the low-rank model.
arXiv Detail & Related papers (2024-03-18T08:50:30Z) - Trajectory-wise Iterative Reinforcement Learning Framework for Auto-bidding [16.556934508295456]
In online advertising, advertisers participate in ad auctions to acquire ad opportunities, often by utilizing auto-bidding tools provided by demand-side platforms (DSPs)
Due to safety concerns, most RL-based auto-bidding policies are trained in simulation, leading to a performance degradation when deployed in online environments.
We propose Trajectory-wise Exploration and Exploitation (TEE), which introduces a novel data collecting and data utilization method for iterative offline RL.
arXiv Detail & Related papers (2024-02-23T05:20:23Z) - Deploying Offline Reinforcement Learning with Human Feedback [34.11507483049087]
Reinforcement learning has shown promise for decision-making tasks in real-world applications.
One practical framework involves training parameterized policy models from an offline dataset and deploying them in an online environment.
This approach can be risky since the offline training may not be perfect, leading to poor performance of the RL models that may take dangerous actions.
We propose an alternative framework that involves a human supervising the RL models and providing additional feedback in the online deployment phase.
arXiv Detail & Related papers (2023-03-13T12:13:16Z) - No-Regret Learning in Two-Echelon Supply Chain with Unknown Demand
Distribution [48.27759561064771]
We consider the two-echelon supply chain model introduced in [Cachon and Zipkin, 1999] under two different settings.
We design algorithms that achieve favorable guarantees for both regret and convergence to the optimal inventory decision in both settings.
Our algorithms are based on Online Gradient Descent and Online Newton Step, together with several new ingredients specifically designed for our problem.
arXiv Detail & Related papers (2022-10-23T08:45:39Z) - Multi-objective Optimization of Notifications Using Offline
Reinforcement Learning [1.2303635283131926]
We formulate the near-real-time notification decision problem as a Markov Decision Process.
We propose an end-to-end offline reinforcement learning framework to optimize sequential notification decisions.
arXiv Detail & Related papers (2022-07-07T00:53:08Z) - EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline
and Online RL [48.552287941528]
Off-policy reinforcement learning holds the promise of sample-efficient learning of decision-making policies.
In the offline RL setting, standard off-policy RL methods can significantly underperform.
We introduce Expected-Max Q-Learning (EMaQ), which is more closely related to the resulting practical algorithm.
arXiv Detail & Related papers (2020-07-21T21:13:02Z) - Self-Supervised Reinforcement Learning for Recommender Systems [77.38665506495553]
We propose self-supervised reinforcement learning for sequential recommendation tasks.
Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL.
Based on such an approach, we propose two frameworks namely Self-Supervised Q-learning(SQN) and Self-Supervised Actor-Critic(SAC)
arXiv Detail & Related papers (2020-06-10T11:18:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.