Related papers: Model-based Constrained MDP for Budget Allocation in Sequential Incentive Marketing

Model-based Constrained MDP for Budget Allocation in Sequential Incentive Marketing

URL: http://arxiv.org/abs/2303.01049v1
Date: Thu, 2 Mar 2023 08:10:45 GMT
Title: Model-based Constrained MDP for Budget Allocation in Sequential Incentive Marketing
Authors: Shuai Xiao, Le Guo, Zaifan Jiang, Lei Lv, Yuanbo Chen, Jun Zhu, Shuang Yang
Abstract summary: Sequential incentive marketing is an important approach for online businesses to acquire customers, increase loyalty and boost sales. How to effectively allocate the incentives so as to maximize the return under the budget constraint is less studied in the literature. We propose an efficient learning algorithm which combines bisection search and model-based planning.
Score: 28.395877073390434
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sequential incentive marketing is an important approach for online businesses to acquire customers, increase loyalty and boost sales. How to effectively allocate the incentives so as to maximize the return (e.g., business objectives) under the budget constraint, however, is less studied in the literature. This problem is technically challenging due to the facts that 1) the allocation strategy has to be learned using historically logged data, which is counterfactual in nature, and 2) both the optimality and feasibility (i.e., that cost cannot exceed budget) needs to be assessed before being deployed to online systems. In this paper, we formulate the problem as a constrained Markov decision process (CMDP). To solve the CMDP problem with logged counterfactual data, we propose an efficient learning algorithm which combines bisection search and model-based planning. First, the CMDP is converted into its dual using Lagrangian relaxation, which is proved to be monotonic with respect to the dual variable. Furthermore, we show that the dual problem can be solved by policy learning, with the optimal dual variable being found efficiently via bisection search (i.e., by taking advantage of the monotonicity). Lastly, we show that model-based planing can be used to effectively accelerate the joint optimization process without retraining the policy for every dual variable. Empirical results on synthetic and real marketing datasets confirm the effectiveness of our methods.

Related papers

Making Large Language Models Better Planners with Reasoning-Decision Alignment [70.5381163219608]
We motivate an end-to-end decision-making model based on multimodality-augmented LLM. We propose a reasoning-decision alignment constraint between the paired CoTs and planning results. We dub our proposed large language planners with reasoning-decision alignment as RDA-Driver.
arXiv Detail & Related papers (2024-08-25T16:43:47Z)
End-to-End Cost-Effective Incentive Recommendation under Budget Constraint with Uplift Modeling [12.160403526724476]
We propose a novel End-to-End Cost-Effective Incentive Recommendation (E3IR) model under budget constraints. Specifically, our methods consist of two modules, i.e., the uplift prediction module and the differentiable allocation module. Our E3IR improves allocation performance compared to existing two-stage approaches.
arXiv Detail & Related papers (2024-08-21T13:48:00Z)
Decision Focused Causal Learning for Direct Counterfactual Marketing Optimization [21.304040539486184]
Decision Focused Learning (DFL) integrates machine learning (ML) and optimization into an end-to-end framework. However, deploying DFL in marketing is non-trivial due to multiple technological challenges. We propose a decision focused causal learning framework (DFCL) for direct counterfactual marketing.
arXiv Detail & Related papers (2024-07-18T16:39:44Z)
Efficiently Training Deep-Learning Parametric Policies using Lagrangian Duality [55.06411438416805]
Constrained Markov Decision Processes (CMDPs) are critical in many high-stakes applications. This paper introduces a novel approach, Two-Stage Deep Decision Rules (TS- DDR) to efficiently train parametric actor policies. It is shown to enhance solution quality and to reduce computation times by several orders of magnitude when compared to current state-of-the-art methods.
arXiv Detail & Related papers (2024-05-23T18:19:47Z)
On Leveraging Large Language Models for Enhancing Entity Resolution: A Cost-efficient Approach [7.996010840316654]
We propose an uncertainty reduction framework using Large Language Models (LLMs) to improve entity resolution results. LLMs capitalize on their advanced linguistic capabilities and a pay-as-you-go'' model that provides significant advantages to those without extensive data science expertise. We show that our method is efficient and effective, offering promising applications in real-world tasks.
arXiv Detail & Related papers (2024-01-07T09:06:58Z)
Online Learning under Budget and ROI Constraints via Weak Adaptivity [57.097119428915796]
Existing primal-dual algorithms for constrained online learning problems rely on two fundamental assumptions. We show how such assumptions can be circumvented by endowing standard primal-dual templates with weakly adaptive regret minimizers. We prove the first best-of-both-worlds no-regret guarantees which hold in absence of the two aforementioned assumptions.
arXiv Detail & Related papers (2023-02-02T16:30:33Z)
Direct Heterogeneous Causal Learning for Resource Allocation Problems in Marketing [20.9377115817821]
Marketing is an important mechanism to increase user engagement and improve platform revenue. Most decision-making problems in marketing can be formulated as resource allocation problems and have been studied for decades. Existing works usually divide the solution procedure into two fully decoupled stages, i.e., machine learning (ML) and operation research (OR)
arXiv Detail & Related papers (2022-11-28T19:27:34Z)
Sequential Information Design: Markov Persuasion Process and Its Efficient Reinforcement Learning [156.5667417159582]
This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs) Planning in MPPs faces the unique challenge in finding a signaling policy that is simultaneously persuasive to the myopic receivers and inducing the optimal long-term cumulative utilities of the sender. We design a provably efficient no-regret learning algorithm, the Optimism-Pessimism Principle for Persuasion Process (OP4), which features a novel combination of both optimism and pessimism principles.
arXiv Detail & Related papers (2022-02-22T05:41:43Z)
Exploiting Submodular Value Functions For Scaling Up Active Perception [60.81276437097671]
In active perception tasks, agent aims to select sensory actions that reduce uncertainty about one or more hidden variables. Partially observable Markov decision processes (POMDPs) provide a natural model for such problems. As the number of sensors available to the agent grows, the computational cost of POMDP planning grows exponentially.
arXiv Detail & Related papers (2020-09-21T09:11:36Z)
An Online Method for A Class of Distributionally Robust Optimization with Non-Convex Objectives [54.29001037565384]
We propose a practical online method for solving a class of online distributionally robust optimization (DRO) problems. Our studies demonstrate important applications in machine learning for improving the robustness of networks.
arXiv Detail & Related papers (2020-06-17T20:19:25Z)
Exploration-Exploitation in Constrained MDPs [79.23623305214275]
We investigate the exploration-exploitation dilemma in Constrained Markov Decision Processes (CMDPs) While learning in an unknown CMDP, an agent should trade-off exploration to discover new information about the MDP. While the agent will eventually learn a good or optimal policy, we do not want the agent to violate the constraints too often during the learning process.
arXiv Detail & Related papers (2020-03-04T17:03:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.