Model-based Constrained MDP for Budget Allocation in Sequential
Incentive Marketing
- URL: http://arxiv.org/abs/2303.01049v1
- Date: Thu, 2 Mar 2023 08:10:45 GMT
- Title: Model-based Constrained MDP for Budget Allocation in Sequential
Incentive Marketing
- Authors: Shuai Xiao, Le Guo, Zaifan Jiang, Lei Lv, Yuanbo Chen, Jun Zhu, Shuang
Yang
- Abstract summary: Sequential incentive marketing is an important approach for online businesses to acquire customers, increase loyalty and boost sales.
How to effectively allocate the incentives so as to maximize the return under the budget constraint is less studied in the literature.
We propose an efficient learning algorithm which combines bisection search and model-based planning.
- Score: 28.395877073390434
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sequential incentive marketing is an important approach for online businesses
to acquire customers, increase loyalty and boost sales. How to effectively
allocate the incentives so as to maximize the return (e.g., business
objectives) under the budget constraint, however, is less studied in the
literature. This problem is technically challenging due to the facts that 1)
the allocation strategy has to be learned using historically logged data, which
is counterfactual in nature, and 2) both the optimality and feasibility (i.e.,
that cost cannot exceed budget) needs to be assessed before being deployed to
online systems. In this paper, we formulate the problem as a constrained Markov
decision process (CMDP). To solve the CMDP problem with logged counterfactual
data, we propose an efficient learning algorithm which combines bisection
search and model-based planning. First, the CMDP is converted into its dual
using Lagrangian relaxation, which is proved to be monotonic with respect to
the dual variable. Furthermore, we show that the dual problem can be solved by
policy learning, with the optimal dual variable being found efficiently via
bisection search (i.e., by taking advantage of the monotonicity). Lastly, we
show that model-based planing can be used to effectively accelerate the joint
optimization process without retraining the policy for every dual variable.
Empirical results on synthetic and real marketing datasets confirm the
effectiveness of our methods.
Related papers
- Decision Focused Causal Learning for Direct Counterfactual Marketing Optimization [21.304040539486184]
Decision Focused Learning (DFL) integrates machine learning (ML) and optimization into an end-to-end framework.
However, deploying DFL in marketing is non-trivial due to multiple technological challenges.
We propose a decision focused causal learning framework (DFCL) for direct counterfactual marketing.
arXiv Detail & Related papers (2024-07-18T16:39:44Z) - Data-Driven Knowledge Transfer in Batch $Q^*$ Learning [5.6665432569907646]
We explore knowledge transfer in dynamic decision-making by concentrating on batch stationary environments.
We propose a framework of Transferred Fitted $Q$-Iteration algorithm with general function approximation.
We show that the final learning error of the $Q*$ function is significantly improved from the single task rate.
arXiv Detail & Related papers (2024-04-01T02:20:09Z) - Online Learning under Budget and ROI Constraints via Weak Adaptivity [57.097119428915796]
Existing primal-dual algorithms for constrained online learning problems rely on two fundamental assumptions.
We show how such assumptions can be circumvented by endowing standard primal-dual templates with weakly adaptive regret minimizers.
We prove the first best-of-both-worlds no-regret guarantees which hold in absence of the two aforementioned assumptions.
arXiv Detail & Related papers (2023-02-02T16:30:33Z) - Direct Heterogeneous Causal Learning for Resource Allocation Problems in
Marketing [20.9377115817821]
Marketing is an important mechanism to increase user engagement and improve platform revenue.
Most decision-making problems in marketing can be formulated as resource allocation problems and have been studied for decades.
Existing works usually divide the solution procedure into two fully decoupled stages, i.e., machine learning (ML) and operation research (OR)
arXiv Detail & Related papers (2022-11-28T19:27:34Z) - Sequential Information Design: Markov Persuasion Process and Its
Efficient Reinforcement Learning [156.5667417159582]
This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs)
Planning in MPPs faces the unique challenge in finding a signaling policy that is simultaneously persuasive to the myopic receivers and inducing the optimal long-term cumulative utilities of the sender.
We design a provably efficient no-regret learning algorithm, the Optimism-Pessimism Principle for Persuasion Process (OP4), which features a novel combination of both optimism and pessimism principles.
arXiv Detail & Related papers (2022-02-22T05:41:43Z) - Exploiting Submodular Value Functions For Scaling Up Active Perception [60.81276437097671]
In active perception tasks, agent aims to select sensory actions that reduce uncertainty about one or more hidden variables.
Partially observable Markov decision processes (POMDPs) provide a natural model for such problems.
As the number of sensors available to the agent grows, the computational cost of POMDP planning grows exponentially.
arXiv Detail & Related papers (2020-09-21T09:11:36Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z) - Regularized Online Allocation Problems: Fairness and Beyond [7.433931244705934]
We introduce the emphregularized online allocation problem, a variant that includes a non-linear regularizer acting on the total resource consumption.
In this problem, requests repeatedly arrive over time and, for each request, a decision maker needs to take an action that generates a reward and consumes resources.
The objective is to simultaneously maximize additively separable rewards and the value of a non-separable regularizer subject to the resource constraints.
arXiv Detail & Related papers (2020-07-01T14:24:58Z) - An Online Method for A Class of Distributionally Robust Optimization
with Non-Convex Objectives [54.29001037565384]
We propose a practical online method for solving a class of online distributionally robust optimization (DRO) problems.
Our studies demonstrate important applications in machine learning for improving the robustness of networks.
arXiv Detail & Related papers (2020-06-17T20:19:25Z) - Exploration-Exploitation in Constrained MDPs [79.23623305214275]
We investigate the exploration-exploitation dilemma in Constrained Markov Decision Processes (CMDPs)
While learning in an unknown CMDP, an agent should trade-off exploration to discover new information about the MDP.
While the agent will eventually learn a good or optimal policy, we do not want the agent to violate the constraints too often during the learning process.
arXiv Detail & Related papers (2020-03-04T17:03:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.