Model-based Constrained MDP for Budget Allocation in Sequential
Incentive Marketing
- URL: http://arxiv.org/abs/2303.01049v1
- Date: Thu, 2 Mar 2023 08:10:45 GMT
- Title: Model-based Constrained MDP for Budget Allocation in Sequential
Incentive Marketing
- Authors: Shuai Xiao, Le Guo, Zaifan Jiang, Lei Lv, Yuanbo Chen, Jun Zhu, Shuang
Yang
- Abstract summary: Sequential incentive marketing is an important approach for online businesses to acquire customers, increase loyalty and boost sales.
How to effectively allocate the incentives so as to maximize the return under the budget constraint is less studied in the literature.
We propose an efficient learning algorithm which combines bisection search and model-based planning.
- Score: 28.395877073390434
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sequential incentive marketing is an important approach for online businesses
to acquire customers, increase loyalty and boost sales. How to effectively
allocate the incentives so as to maximize the return (e.g., business
objectives) under the budget constraint, however, is less studied in the
literature. This problem is technically challenging due to the facts that 1)
the allocation strategy has to be learned using historically logged data, which
is counterfactual in nature, and 2) both the optimality and feasibility (i.e.,
that cost cannot exceed budget) needs to be assessed before being deployed to
online systems. In this paper, we formulate the problem as a constrained Markov
decision process (CMDP). To solve the CMDP problem with logged counterfactual
data, we propose an efficient learning algorithm which combines bisection
search and model-based planning. First, the CMDP is converted into its dual
using Lagrangian relaxation, which is proved to be monotonic with respect to
the dual variable. Furthermore, we show that the dual problem can be solved by
policy learning, with the optimal dual variable being found efficiently via
bisection search (i.e., by taking advantage of the monotonicity). Lastly, we
show that model-based planing can be used to effectively accelerate the joint
optimization process without retraining the policy for every dual variable.
Empirical results on synthetic and real marketing datasets confirm the
effectiveness of our methods.
Related papers
- Making Large Language Models Better Planners with Reasoning-Decision Alignment [70.5381163219608]
We motivate an end-to-end decision-making model based on multimodality-augmented LLM.
We propose a reasoning-decision alignment constraint between the paired CoTs and planning results.
We dub our proposed large language planners with reasoning-decision alignment as RDA-Driver.
arXiv Detail & Related papers (2024-08-25T16:43:47Z) - End-to-End Cost-Effective Incentive Recommendation under Budget Constraint with Uplift Modeling [12.160403526724476]
We propose a novel End-to-End Cost-Effective Incentive Recommendation (E3IR) model under budget constraints.
Specifically, our methods consist of two modules, i.e., the uplift prediction module and the differentiable allocation module.
Our E3IR improves allocation performance compared to existing two-stage approaches.
arXiv Detail & Related papers (2024-08-21T13:48:00Z) - Decision Focused Causal Learning for Direct Counterfactual Marketing Optimization [21.304040539486184]
Decision Focused Learning (DFL) integrates machine learning (ML) and optimization into an end-to-end framework.
However, deploying DFL in marketing is non-trivial due to multiple technological challenges.
We propose a decision focused causal learning framework (DFCL) for direct counterfactual marketing.
arXiv Detail & Related papers (2024-07-18T16:39:44Z) - On Leveraging Large Language Models for Enhancing Entity Resolution: A Cost-efficient Approach [7.996010840316654]
We propose an uncertainty reduction framework using Large Language Models (LLMs) to improve entity resolution results.
LLMs capitalize on their advanced linguistic capabilities and a pay-as-you-go'' model that provides significant advantages to those without extensive data science expertise.
We show that our method is efficient and effective, offering promising applications in real-world tasks.
arXiv Detail & Related papers (2024-01-07T09:06:58Z) - Online Learning under Budget and ROI Constraints via Weak Adaptivity [57.097119428915796]
Existing primal-dual algorithms for constrained online learning problems rely on two fundamental assumptions.
We show how such assumptions can be circumvented by endowing standard primal-dual templates with weakly adaptive regret minimizers.
We prove the first best-of-both-worlds no-regret guarantees which hold in absence of the two aforementioned assumptions.
arXiv Detail & Related papers (2023-02-02T16:30:33Z) - Direct Heterogeneous Causal Learning for Resource Allocation Problems in
Marketing [20.9377115817821]
Marketing is an important mechanism to increase user engagement and improve platform revenue.
Most decision-making problems in marketing can be formulated as resource allocation problems and have been studied for decades.
Existing works usually divide the solution procedure into two fully decoupled stages, i.e., machine learning (ML) and operation research (OR)
arXiv Detail & Related papers (2022-11-28T19:27:34Z) - Sequential Information Design: Markov Persuasion Process and Its
Efficient Reinforcement Learning [156.5667417159582]
This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs)
Planning in MPPs faces the unique challenge in finding a signaling policy that is simultaneously persuasive to the myopic receivers and inducing the optimal long-term cumulative utilities of the sender.
We design a provably efficient no-regret learning algorithm, the Optimism-Pessimism Principle for Persuasion Process (OP4), which features a novel combination of both optimism and pessimism principles.
arXiv Detail & Related papers (2022-02-22T05:41:43Z) - Exploiting Submodular Value Functions For Scaling Up Active Perception [60.81276437097671]
In active perception tasks, agent aims to select sensory actions that reduce uncertainty about one or more hidden variables.
Partially observable Markov decision processes (POMDPs) provide a natural model for such problems.
As the number of sensors available to the agent grows, the computational cost of POMDP planning grows exponentially.
arXiv Detail & Related papers (2020-09-21T09:11:36Z) - An Online Method for A Class of Distributionally Robust Optimization
with Non-Convex Objectives [54.29001037565384]
We propose a practical online method for solving a class of online distributionally robust optimization (DRO) problems.
Our studies demonstrate important applications in machine learning for improving the robustness of networks.
arXiv Detail & Related papers (2020-06-17T20:19:25Z) - Exploration-Exploitation in Constrained MDPs [79.23623305214275]
We investigate the exploration-exploitation dilemma in Constrained Markov Decision Processes (CMDPs)
While learning in an unknown CMDP, an agent should trade-off exploration to discover new information about the MDP.
While the agent will eventually learn a good or optimal policy, we do not want the agent to violate the constraints too often during the learning process.
arXiv Detail & Related papers (2020-03-04T17:03:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.