Marketing Budget Allocation with Offline Constrained Deep Reinforcement
Learning
- URL: http://arxiv.org/abs/2309.02669v1
- Date: Wed, 6 Sep 2023 02:35:46 GMT
- Title: Marketing Budget Allocation with Offline Constrained Deep Reinforcement
Learning
- Authors: Tianchi Cai, Jiyan Jiang, Wenpeng Zhang, Shiji Zhou, Xierui Song, Li
Yu, Lihong Gu, Xiaodong Zeng, Jinjie Gu, Guannan Zhang
- Abstract summary: We study the budget allocation problem in online marketing campaigns that utilize previously collected offline data.
We propose a novel game-theoretic offline value-based reinforcement learning method using mixed policies.
- Score: 22.993339296954545
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We study the budget allocation problem in online marketing campaigns that
utilize previously collected offline data. We first discuss the long-term
effect of optimizing marketing budget allocation decisions in the offline
setting. To overcome the challenge, we propose a novel game-theoretic offline
value-based reinforcement learning method using mixed policies. The proposed
method reduces the need to store infinitely many policies in previous methods
to only constantly many policies, which achieves nearly optimal policy
efficiency, making it practical and favorable for industrial usage. We further
show that this method is guaranteed to converge to the optimal policy, which
cannot be achieved by previous value-based reinforcement learning methods for
marketing budget allocation. Our experiments on a large-scale marketing
campaign with tens-of-millions users and more than one billion budget verify
the theoretical results and show that the proposed method outperforms various
baseline methods. The proposed method has been successfully deployed to serve
all the traffic of this marketing campaign.
Related papers
- Metalearners for Ranking Treatment Effects [1.469168639465869]
We show how learning to rank can maximize the area under a policy's incremental profit curve.
We show how learning to rank can maximize the area under a policy's incremental profit curve.
arXiv Detail & Related papers (2024-05-03T15:31:18Z) - IOB: Integrating Optimization Transfer and Behavior Transfer for
Multi-Policy Reuse [50.90781542323258]
Reinforcement learning (RL) agents can transfer knowledge from source policies to a related target task.
Previous methods introduce additional components, such as hierarchical policies or estimations of source policies' value functions.
We propose a novel transfer RL method that selects the source policy without training extra components.
arXiv Detail & Related papers (2023-08-14T09:22:35Z) - Iteratively Refined Behavior Regularization for Offline Reinforcement
Learning [57.10922880400715]
In this paper, we propose a new algorithm that substantially enhances behavior-regularization based on conservative policy iteration.
By iteratively refining the reference policy used for behavior regularization, conservative policy update guarantees gradually improvement.
Experimental results on the D4RL benchmark indicate that our method outperforms previous state-of-the-art baselines in most tasks.
arXiv Detail & Related papers (2023-06-09T07:46:24Z) - An End-to-End Framework for Marketing Effectiveness Optimization under
Budget Constraint [25.89397524825504]
We propose a novel end-to-end framework to directly optimize the business goal under budget constraints.
Our core idea is to construct a regularizer to represent the marketing goal and optimize it efficiently using gradient estimation techniques.
Our proposed method is currently deployed to allocate marketing budgets for hundreds of millions of users on a short video platform.
arXiv Detail & Related papers (2023-02-09T07:39:34Z) - A Profit-Maximizing Strategy for Advertising on the e-Commerce Platforms [1.565361244756411]
The proposed model aims to find the optimal set of features to maximize the probability of converting targeted audiences into actual buyers.
We conduct an empirical study featuring real-world data from Tmall to show that our proposed method can effectively optimize the advertising strategy with budgetary constraints.
arXiv Detail & Related papers (2022-10-31T01:45:42Z) - Adversarial Learning for Incentive Optimization in Mobile Payment
Marketing [17.645000197183045]
Payment platforms hold large-scale marketing campaigns, which allocate incentives to encourage users to pay through their applications.
To maximize the return on investment, incentive allocations are commonly solved in a two-stage procedure.
We propose a bias correction adversarial network to overcome this obstacle.
arXiv Detail & Related papers (2021-12-28T07:54:39Z) - Off-policy Reinforcement Learning with Optimistic Exploration and
Distribution Correction [73.77593805292194]
We train a separate exploration policy to maximize an approximate upper confidence bound of the critics in an off-policy actor-critic framework.
To mitigate the off-policy-ness, we adapt the recently introduced DICE framework to learn a distribution correction ratio for off-policy actor-critic training.
arXiv Detail & Related papers (2021-10-22T22:07:51Z) - OptiDICE: Offline Policy Optimization via Stationary Distribution
Correction Estimation [59.469401906712555]
We present an offline reinforcement learning algorithm that prevents overestimation in a more principled way.
Our algorithm, OptiDICE, directly estimates the stationary distribution corrections of the optimal policy.
We show that OptiDICE performs competitively with the state-of-the-art methods.
arXiv Detail & Related papers (2021-06-21T00:43:30Z) - Universal Trading for Order Execution with Oracle Policy Distillation [99.57416828489568]
We propose a novel universal trading policy optimization framework to bridge the gap between the noisy yet imperfect market states and the optimal action sequences for order execution.
We show that our framework can better guide the learning of the common policy towards practically optimal execution by an oracle teacher with perfect information.
arXiv Detail & Related papers (2021-01-28T05:52:18Z) - Dynamic Knapsack Optimization Towards Efficient Multi-Channel Sequential
Advertising [52.3825928886714]
We formulate the sequential advertising strategy optimization as a dynamic knapsack problem.
We propose a theoretically guaranteed bilevel optimization framework, which significantly reduces the solution space of the original optimization space.
To improve the exploration efficiency of reinforcement learning, we also devise an effective action space reduction approach.
arXiv Detail & Related papers (2020-06-29T18:50:35Z) - Heterogeneous Causal Learning for Effectiveness Optimization in User
Marketing [2.752817022620644]
We propose a treatment effect optimization methodology for user marketing.
This algorithm learns from past experiments and utilizes novel optimization methods to optimize cost efficiency with respect to user selection.
Our proposed constrained and direct optimization algorithms outperform by 24.6% compared with the best performing method in prior art and baseline methods.
arXiv Detail & Related papers (2020-04-21T01:34:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.