Optimal Bidding Strategy without Exploration in Real-time Bidding
- URL: http://arxiv.org/abs/2004.00100v1
- Date: Tue, 31 Mar 2020 20:43:28 GMT
- Title: Optimal Bidding Strategy without Exploration in Real-time Bidding
- Authors: Aritra Ghosh, Saayan Mitra, Somdeb Sarkhel, Viswanathan Swaminathan
- Abstract summary: maximizing utility with a budget constraint is the primary goal for advertisers in real-time bidding (RTB) systems.
Previous works ignore the losing auctions to alleviate the difficulty with censored states.
We propose a novel practical framework using the maximum entropy principle to imitate the behavior of the true distribution observed in real-time traffic.
- Score: 14.035270361462576
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Maximizing utility with a budget constraint is the primary goal for
advertisers in real-time bidding (RTB) systems. The policy maximizing the
utility is referred to as the optimal bidding strategy. Earlier works on
optimal bidding strategy apply model-based batch reinforcement learning methods
which can not generalize to unknown budget and time constraint. Further, the
advertiser observes a censored market price which makes direct evaluation
infeasible on batch test datasets. Previous works ignore the losing auctions to
alleviate the difficulty with censored states; thus significantly modifying the
test distribution. We address the challenge of lacking a clear evaluation
procedure as well as the error propagated through batch reinforcement learning
methods in RTB systems. We exploit two conditional independence structures in
the sequential bidding process that allow us to propose a novel practical
framework using the maximum entropy principle to imitate the behavior of the
true distribution observed in real-time traffic. Moreover, the framework allows
us to train a model that can generalize to the unseen budget conditions than
limit only to those observed in history. We compare our methods on two
real-world RTB datasets with several baselines and demonstrate significantly
improved performance under various budget settings.
Related papers
- Metalearners for Ranking Treatment Effects [1.469168639465869]
We show how learning to rank can maximize the area under a policy's incremental profit curve.
We show how learning to rank can maximize the area under a policy's incremental profit curve.
arXiv Detail & Related papers (2024-05-03T15:31:18Z) - Personalized Pricing with Invalid Instrumental Variables:
Identification, Estimation, and Policy Learning [5.372349090093469]
This work studies offline personalized pricing under endogeneity using an instrumental variable approach.
We propose a new policy learning method for Personalized pRicing using Invalid iNsTrumental variables.
arXiv Detail & Related papers (2023-02-24T14:50:47Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - Online Learning under Budget and ROI Constraints via Weak Adaptivity [57.097119428915796]
Existing primal-dual algorithms for constrained online learning problems rely on two fundamental assumptions.
We show how such assumptions can be circumvented by endowing standard primal-dual templates with weakly adaptive regret minimizers.
We prove the first best-of-both-worlds no-regret guarantees which hold in absence of the two aforementioned assumptions.
arXiv Detail & Related papers (2023-02-02T16:30:33Z) - Adaptive Risk-Aware Bidding with Budget Constraint in Display
Advertising [47.14651340748015]
We propose a novel adaptive risk-aware bidding algorithm with budget constraint via reinforcement learning.
We theoretically unveil the intrinsic relation between the uncertainty and the risk tendency based on value at risk (VaR)
arXiv Detail & Related papers (2022-12-06T18:50:09Z) - Latent State Marginalization as a Low-cost Approach for Improving
Exploration [79.12247903178934]
We propose the adoption of latent variable policies within the MaxEnt framework.
We show that latent variable policies naturally emerges under the use of world models with a latent belief state.
We experimentally validate our method on continuous control tasks, showing that effective marginalization can lead to better exploration and more robust training.
arXiv Detail & Related papers (2022-10-03T15:09:12Z) - ROI Constrained Bidding via Curriculum-Guided Bayesian Reinforcement
Learning [34.82004227655201]
We specialize in ROI-Constrained Bidding in non-stationary markets.
Based on a Partially Observable Constrained Markov Decision Process, we propose the first hard barrier solution to accommodate non-monotonic constraints.
Our method exploits a parameter-free indicator-augmented reward function and develops a Curriculum-Guided Bayesian Reinforcement Learning framework.
arXiv Detail & Related papers (2022-06-10T17:30:12Z) - Arbitrary Distribution Modeling with Censorship in Real-Time Bidding
Advertising [2.562910030418378]
The purpose of Inventory Pricing is to bid the right prices to online ad opportunities, which is crucial for a Demand-Side Platform (DSP) to win auctions in Real-Time Bidding (RTB)
Most of the previous works made strong assumptions on the distribution form of the winning price, which reduced their accuracy and weakened their ability to make generalizations.
We propose a novel loss function, Neighborhood Likelihood Loss (NLL), collaborating with a proposed framework, Arbitrary Distribution Modeling (ADM) to predict the winning price distribution under censorship.
arXiv Detail & Related papers (2021-10-26T11:40:00Z) - Off-policy Reinforcement Learning with Optimistic Exploration and
Distribution Correction [73.77593805292194]
We train a separate exploration policy to maximize an approximate upper confidence bound of the critics in an off-policy actor-critic framework.
To mitigate the off-policy-ness, we adapt the recently introduced DICE framework to learn a distribution correction ratio for off-policy actor-critic training.
arXiv Detail & Related papers (2021-10-22T22:07:51Z) - Hierarchical Adaptive Contextual Bandits for Resource Constraint based
Recommendation [49.69139684065241]
Contextual multi-armed bandit (MAB) achieves cutting-edge performance on a variety of problems.
In this paper, we propose a hierarchical adaptive contextual bandit method (HATCH) to conduct the policy learning of contextual bandits with a budget constraint.
arXiv Detail & Related papers (2020-04-02T17:04:52Z) - Scalable Bid Landscape Forecasting in Real-time Bidding [12.692521867728091]
In programmatic advertising, ad slots are usually sold using second-price (SP) auctions in real-time.
In SP, for a single item, the dominant strategy of each bidder is to bid the true value from the bidder's perspective.
We propose a heteroscedastic fully parametric censored regression approach, as well as a mixture density censored network.
arXiv Detail & Related papers (2020-01-18T03:20:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.