Online Bidding under RoS Constraints without Knowing the Value
- URL: http://arxiv.org/abs/2503.03195v1
- Date: Wed, 05 Mar 2025 05:25:54 GMT
- Title: Online Bidding under RoS Constraints without Knowing the Value
- Authors: Sushant Vijayan, Zhe Feng, Swati Padmanabhan, Karthikeyan Shanmugam, Arun Suggala, Di Wang,
- Abstract summary: We consider the problem of bidding in online advertising, where an advertiser aims to maximize value while adhering to budget and Return-on-Spend constraints.<n>We propose a novel Upper Confidence Bound (UCB)-style algorithm that carefully manages this trade-off.
- Score: 22.193658401789033
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We consider the problem of bidding in online advertising, where an advertiser aims to maximize value while adhering to budget and Return-on-Spend (RoS) constraints. Unlike prior work that assumes knowledge of the value generated by winning each impression ({e.g.,} conversions), we address the more realistic setting where the advertiser must simultaneously learn the optimal bidding strategy and the value of each impression opportunity. This introduces a challenging exploration-exploitation dilemma: the advertiser must balance exploring different bids to estimate impression values with exploiting current knowledge to bid effectively. To address this, we propose a novel Upper Confidence Bound (UCB)-style algorithm that carefully manages this trade-off. Via a rigorous theoretical analysis, we prove that our algorithm achieves $\widetilde{O}(\sqrt{T\log(|\mathcal{B}|T)})$ regret and constraint violation, where $T$ is the number of bidding rounds and $\mathcal{B}$ is the domain of possible bids. This establishes the first optimal regret and constraint violation bounds for bidding in the online setting with unknown impression values. Moreover, our algorithm is computationally efficient and simple to implement. We validate our theoretical findings through experiments on synthetic data, demonstrating that our algorithm exhibits strong empirical performance compared to existing approaches.
Related papers
- Cost-Control in Display Advertising: Theory vs Practice [0.2574913881702984]
In display advertising, advertisers want to achieve a marketing objective with constraints on budget and cost-per-outcome.
This is usually formulated as an optimization problem that maximizes the total utility under constraints.
The optimization is carried out in an online fashion in the dual space - for an incoming Ad auction, a bid is placed using an optimal bidding formula. based on the outcome of the previous auctions, the dual variables are updated in an online fashion.
While this approach is theoretically sound, in practice, the dual variables are not optimal from the beginning, but rather converge over time.
arXiv Detail & Related papers (2024-09-05T19:22:33Z) - Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM)<n>We study a model within this domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary.<n>We propose an algorithm namely robust contextual dueling bandits (RCDB), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z) - Online Learning in Contextual Second-Price Pay-Per-Click Auctions [47.06746975822902]
We study online learning in contextual pay-per-click auctions where at each of the $T$ rounds, the learner receives some context along with a set of ads.
The learner's goal is to minimize her regret, defined as the gap between her total revenue and that of an oracle strategy.
arXiv Detail & Related papers (2023-10-08T07:04:22Z) - Online Learning under Budget and ROI Constraints via Weak Adaptivity [57.097119428915796]
Existing primal-dual algorithms for constrained online learning problems rely on two fundamental assumptions.
We show how such assumptions can be circumvented by endowing standard primal-dual templates with weakly adaptive regret minimizers.
We prove the first best-of-both-worlds no-regret guarantees which hold in absence of the two aforementioned assumptions.
arXiv Detail & Related papers (2023-02-02T16:30:33Z) - Adaptive Risk-Aware Bidding with Budget Constraint in Display
Advertising [47.14651340748015]
We propose a novel adaptive risk-aware bidding algorithm with budget constraint via reinforcement learning.
We theoretically unveil the intrinsic relation between the uncertainty and the risk tendency based on value at risk (VaR)
arXiv Detail & Related papers (2022-12-06T18:50:09Z) - A Reinforcement Learning Approach in Multi-Phase Second-Price Auction
Design [158.0041488194202]
We study reserve price optimization in multi-phase second price auctions.
From the seller's perspective, we need to efficiently explore the environment in the presence of potentially nontruthful bidders.
Third, the seller's per-step revenue is unknown, nonlinear, and cannot even be directly observed from the environment.
arXiv Detail & Related papers (2022-10-19T03:49:05Z) - Fast Rate Learning in Stochastic First Price Bidding [0.0]
First-price auctions have largely replaced traditional bidding approaches based on Vickrey auctions in programmatic advertising.
We show how to achieve significantly lower regret when the opponents' maximal bid distribution is known.
Our algorithms converge much faster than alternatives proposed in the literature for various bid distributions.
arXiv Detail & Related papers (2021-07-05T07:48:52Z) - Efficient Algorithms for Stochastic Repeated Second-price Auctions [0.0]
We develop efficient sequential bidding strategies for repeated auctions.
We provide the first parametric lower bound for this problem.
We propose more explainable strategies which are reminiscent of the Explore Then Commit bandit algorithm.
arXiv Detail & Related papers (2020-11-10T12:45:02Z) - Learning to Bid Optimally and Efficiently in Adversarial First-price
Auctions [40.30925727499806]
We develop the first minimax optimal online bidding algorithm that achieves an $widetildeO(sqrtT)$ regret.
We test our algorithm on three real-world first-price auction datasets obtained from Verizon Media.
arXiv Detail & Related papers (2020-07-09T05:40:39Z) - Optimal Bidding Strategy without Exploration in Real-time Bidding [14.035270361462576]
maximizing utility with a budget constraint is the primary goal for advertisers in real-time bidding (RTB) systems.
Previous works ignore the losing auctions to alleviate the difficulty with censored states.
We propose a novel practical framework using the maximum entropy principle to imitate the behavior of the true distribution observed in real-time traffic.
arXiv Detail & Related papers (2020-03-31T20:43:28Z) - Optimal No-regret Learning in Repeated First-price Auctions [38.908235632001116]
We study online learning in repeated first-price auctions.
We develop the first learning algorithm that achieves a near-optimal $widetildeO(sqrtT)$ regret bound.
arXiv Detail & Related papers (2020-03-22T03:32:09Z) - Adversarial Attacks on Linear Contextual Bandits [87.08004581867537]
Malicious agents may have incentives to attack the bandit algorithm to induce it to perform a desired behavior.
We show that a malicious agent can force a linear contextual bandit algorithm to pull any desired arm $T - o(T)$ times over a horizon of $T$ steps.
We also investigate the case when a malicious agent is interested in affecting the behavior of the bandit algorithm in a single context.
arXiv Detail & Related papers (2020-02-10T15:04:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.