ROI Constrained Bidding via Curriculum-Guided Bayesian Reinforcement
Learning
- URL: http://arxiv.org/abs/2206.05240v2
- Date: Tue, 14 Jun 2022 08:55:21 GMT
- Title: ROI Constrained Bidding via Curriculum-Guided Bayesian Reinforcement
Learning
- Authors: Haozhe Wang, Chao Du, Panyan Fang, Shuo Yuan, Xuming He, Liang Wang,
Bo Zheng
- Abstract summary: We specialize in ROI-Constrained Bidding in non-stationary markets.
Based on a Partially Observable Constrained Markov Decision Process, we propose the first hard barrier solution to accommodate non-monotonic constraints.
Our method exploits a parameter-free indicator-augmented reward function and develops a Curriculum-Guided Bayesian Reinforcement Learning framework.
- Score: 34.82004227655201
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Real-Time Bidding (RTB) is an important mechanism in modern online
advertising systems. Advertisers employ bidding strategies in RTB to optimize
their advertising effects subject to various financial requirements, among
which a widely adopted one is the return-on-investment (ROI) constraint. ROIs
change non-monotonically during the sequential bidding process, usually
presenting a see-saw effect between constraint satisfaction and objective
optimization. Existing solutions to the constraint-objective trade-off are
typically established in static or mildly changing markets. However, these
methods fail significantly in non-stationary advertising markets due to their
inability to adapt to varying dynamics and partial observability. In this work,
we specialize in ROI-Constrained Bidding in non-stationary markets. Based on a
Partially Observable Constrained Markov Decision Process, we propose the first
hard barrier solution to accommodate non-monotonic constraints. Our method
exploits a parameter-free indicator-augmented reward function and develops a
Curriculum-Guided Bayesian Reinforcement Learning (CBRL) framework to
adaptively control the constraint-objective trade-off in non-stationary
advertising markets. Extensive experiments on a large-scale industrial dataset
with two problem settings reveal that CBRL generalizes well in both
in-distribution and out-of-distribution data regimes, and enjoys outstanding
stability.
Related papers
- Improve ROI with Causal Learning and Conformal Prediction [8.430828492374072]
This study delves into the Cost-aware Binary Treatment Assignment Problem (C-B) across different industries.
It focuses on the state-of-the-art Direct ROI Prediction (TAP) method.
Addressing these challenges is essential for ensuring dependable and robust predictions in varied operational contexts.
arXiv Detail & Related papers (2024-07-01T08:16:25Z) - Deep Hedging with Market Impact [0.20482269513546458]
We propose a novel general market impact dynamic hedging model based on Deep Reinforcement Learning (DRL)
The optimal policy obtained from the DRL model is analysed using several option hedging simulations and compared to commonly used procedures such as delta hedging.
arXiv Detail & Related papers (2024-02-20T19:08:24Z) - Insurance pricing on price comparison websites via reinforcement
learning [7.023335262537794]
This paper introduces reinforcement learning framework that learns optimal pricing policy by integrating model-based and model-free methods.
The paper also highlights the importance of evaluating pricing policies using an offline dataset in a consistent fashion.
arXiv Detail & Related papers (2023-08-14T04:44:56Z) - Online Learning under Budget and ROI Constraints via Weak Adaptivity [57.097119428915796]
Existing primal-dual algorithms for constrained online learning problems rely on two fundamental assumptions.
We show how such assumptions can be circumvented by endowing standard primal-dual templates with weakly adaptive regret minimizers.
We prove the first best-of-both-worlds no-regret guarantees which hold in absence of the two aforementioned assumptions.
arXiv Detail & Related papers (2023-02-02T16:30:33Z) - Stochastic Methods for AUC Optimization subject to AUC-based Fairness
Constraints [51.12047280149546]
A direct approach for obtaining a fair predictive model is to train the model through optimizing its prediction performance subject to fairness constraints.
We formulate the training problem of a fairness-aware machine learning model as an AUC optimization problem subject to a class of AUC-based fairness constraints.
We demonstrate the effectiveness of our approach on real-world data under different fairness metrics.
arXiv Detail & Related papers (2022-12-23T22:29:08Z) - Adaptive Risk-Aware Bidding with Budget Constraint in Display
Advertising [47.14651340748015]
We propose a novel adaptive risk-aware bidding algorithm with budget constraint via reinforcement learning.
We theoretically unveil the intrinsic relation between the uncertainty and the risk tendency based on value at risk (VaR)
arXiv Detail & Related papers (2022-12-06T18:50:09Z) - VFed-SSD: Towards Practical Vertical Federated Advertising [53.08038962443853]
We propose a semi-supervised split distillation framework VFed-SSD to alleviate the two limitations.
Specifically, we develop a self-supervised task MatchedPair Detection (MPD) to exploit the vertically partitioned unlabeled data.
Our framework provides an efficient federation-enhanced solution for real-time display advertising with minimal deploying cost and significant performance lift.
arXiv Detail & Related papers (2022-05-31T17:45:30Z) - Demand Responsive Dynamic Pricing Framework for Prosumer Dominated
Microgrids using Multiagent Reinforcement Learning [59.28219519916883]
This paper proposes a new multiagent Reinforcement Learning based decision-making environment for implementing a Real-Time Pricing (RTP) DR technique in a prosumer dominated microgrid.
The proposed technique addresses several shortcomings common to traditional DR methods and provides significant economic benefits to the grid operator and prosumers.
arXiv Detail & Related papers (2020-09-23T01:44:57Z) - Optimal Bidding Strategy without Exploration in Real-time Bidding [14.035270361462576]
maximizing utility with a budget constraint is the primary goal for advertisers in real-time bidding (RTB) systems.
Previous works ignore the losing auctions to alleviate the difficulty with censored states.
We propose a novel practical framework using the maximum entropy principle to imitate the behavior of the true distribution observed in real-time traffic.
arXiv Detail & Related papers (2020-03-31T20:43:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.