Optimal Spend Rate Estimation and Pacing for Ad Campaigns with Budgets
- URL: http://arxiv.org/abs/2202.05881v1
- Date: Fri, 4 Feb 2022 20:18:00 GMT
- Title: Optimal Spend Rate Estimation and Pacing for Ad Campaigns with Budgets
- Authors: Bhuvesh Kumar, Jamie Morgenstern, and Okke Schrijvers
- Abstract summary: This paper considers two models for impressions and competition that varies with time.
We present the first learning theoretic guarantees on both the accuracy of spend plans and the resulting end-to-end budget management system.
- Score: 6.870572485624023
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Online ad platforms offer budget management tools for advertisers that aim to
maximize the number of conversions given a budget constraint. As the volume of
impressions, conversion rates and prices vary over time, these budget
management systems learn a spend plan (to find the optimal distribution of
budget over time) and run a pacing algorithm which follows the spend plan.
This paper considers two models for impressions and competition that varies
with time: a) an episodic model which exhibits stationarity in each episode,
but each episode can be arbitrarily different from the next, and b) a model
where the distributions of prices and values change slowly over time. We
present the first learning theoretic guarantees on both the accuracy of spend
plans and the resulting end-to-end budget management system. We present four
main results: 1) for the episodic setting we give sample complexity bounds for
the spend rate prediction problem: given $n$ samples from each episode, with
high probability we have $|\widehat{\rho}_e - \rho_e| \leq
\tilde{O}(\frac{1}{n^{1/3}})$ where $\rho_e$ is the optimal spend rate for the
episode, $\widehat{\rho}_e$ is the estimate from our algorithm, 2) we extend
the algorithm of Balseiro and Gur (2017) to operate on varying, approximate
spend rates and show that the resulting combined system of optimal spend rate
estimation and online pacing algorithm for episodic settings has regret that
vanishes in number of historic samples $n$ and the number of rounds $T$, 3) for
non-episodic but slowly-changing distributions we show that the same approach
approximates the optimal bidding strategy up to a factor dependent on the
rate-of-change of the distributions and 4) we provide experiments showing that
our algorithm outperforms both static spend plans and non-pacing across a wide
variety of settings.
Related papers
- No-Regret Learning Under Adversarial Resource Constraints: A Spending Plan Is All You Need! [56.80767500991973]
We focus on two canonical settings: $(i)$ online resource allocation where rewards and costs are observed before action selection, and $(ii)$ online learning with resource constraints where they are observed after action selection, under full feedback or bandit feedback.<n>It is well known that achieving sublinear regret in these settings is impossible when reward and cost distributions may change arbitrarily over time.<n>We design general (primal-)dual methods that achieve sublinear regret with respect to baselines that follow the spending plan. Crucially, the performance of our algorithms improves when the spending plan ensures a well-balanced distribution of the budget
arXiv Detail & Related papers (2025-06-16T08:42:31Z) - Better Rates for Random Task Orderings in Continual Linear Models [50.11453013647086]
We analyze the forgetting, i.e., loss on previously seen tasks, after $k$ iterations.
We develop novel last-iterate bounds in the realizable least squares setup, and apply them to derive new results for continual learning.
We prove for the first time that randomization alone, with no task repetition, can prevent catastrophic forgetting in sufficiently long task.
arXiv Detail & Related papers (2025-04-06T18:39:45Z) - Optimal Decentralized Smoothed Online Convex Optimization [9.449153668916098]
We study the multi-agent Smoothed Online Convex Optimization (SOCO) problem, where $N$ agents interact through a communication graph.
We propose the first truly decentralized algorithm ACORD for multi-agent SOCO that provably exhibits optimality.
Our results hold even when the communication graph changes arbitrarily and adaptively over time.
arXiv Detail & Related papers (2024-11-13T05:59:04Z) - Stochastic Multi-round Submodular Optimization with Budget [7.902059578326225]
We aim to adaptively maximize the sum, over multiple rounds, of a monotone submodular objective function defined on subsets of items.
The objective function also depends on the realization of events, and the total number of items we can select over all rounds is bounded by a limited budget.
arXiv Detail & Related papers (2024-04-21T18:24:43Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - Rate-Optimal Policy Optimization for Linear Markov Decision Processes [65.5958446762678]
We obtain rate-optimal $widetilde O (sqrt K)$ regret where $K$ denotes the number of episodes.
Our work is the first to establish the optimal (w.r.t.$K$) rate of convergence in the setting with bandit feedback.
No algorithm with an optimal rate guarantee is currently known.
arXiv Detail & Related papers (2023-08-28T15:16:09Z) - Stochastic Approximation Approaches to Group Distributionally Robust Optimization and Beyond [89.72693227960274]
This paper investigates group distributionally robust optimization (GDRO) with the goal of learning a model that performs well over $m$ different distributions.
To reduce the number of samples in each round from $m$ to 1, we cast GDRO as a two-player game, where one player conducts and the other executes an online algorithm for non-oblivious multi-armed bandits.
In the second scenario, we propose to optimize the average top-$k$ risk instead of the maximum risk, thereby mitigating the impact of distributions.
arXiv Detail & Related papers (2023-02-18T09:24:15Z) - Robust Budget Pacing with a Single Sample [9.826939499674676]
We show that just one sample per distribution is enough to achieve the near-optimal $tilde O(sqrtT)$-regret.
We show that just one sample per distribution is enough to achieve the near-optimal $tilde O(sqrtT)$-regret.
arXiv Detail & Related papers (2023-02-03T21:23:09Z) - Human-in-the-loop: Provably Efficient Preference-based Reinforcement
Learning with General Function Approximation [107.54516740713969]
We study human-in-the-loop reinforcement learning (RL) with trajectory preferences.
Instead of receiving a numeric reward at each step, the agent only receives preferences over trajectory pairs from a human overseer.
We propose the first optimistic model-based algorithm for PbRL with general function approximation.
arXiv Detail & Related papers (2022-05-23T09:03:24Z) - REX: Revisiting Budgeted Training with an Improved Schedule [14.618325490983052]
We propose a novel profile and sampling rate combination called the Reflected Exponential (REX) schedule.
REX outperforms the linear schedule in the low budget regime, while matching or exceeding the performance of several state-of-the-art learning rate schedules.
arXiv Detail & Related papers (2021-07-09T04:17:35Z) - Online Stochastic Optimization with Wasserstein Based Non-stationarity [12.91020811577007]
We consider a general online optimization problem with multiple budget constraints over a horizon of finite time periods.
The objective of the decision maker is to maximize the cumulative reward subject to the budget constraints.
This formulation captures a wide range of applications including online linear programming and network revenue management.
arXiv Detail & Related papers (2020-12-13T04:47:37Z) - Minimax Regret for Stochastic Shortest Path with Adversarial Costs and
Known Transition [37.6975819766632]
We study the shortest path problem with adversarial costs and known transition.
We show that the minimax regret is $widetildeO(sqrtDTstar K)$ and $widetildeO(sqrtDTstar SA K)$ for the full-information setting and the bandit feedback setting.
arXiv Detail & Related papers (2020-12-07T20:55:28Z) - Private Stochastic Non-Convex Optimization: Adaptive Algorithms and
Tighter Generalization Bounds [72.63031036770425]
We propose differentially private (DP) algorithms for bound non-dimensional optimization.
We demonstrate two popular deep learning methods on the empirical advantages over standard gradient methods.
arXiv Detail & Related papers (2020-06-24T06:01:24Z) - Stochastic Shortest Path with Adversarially Changing Costs [57.90236104782219]
shortest path (SSP) is a well-known problem in planning and control.
We present the adversarial SSP model that also accounts for adversarial changes in the costs over time.
We are the first to consider this natural setting of adversarial SSP and obtain sub-linear regret for it.
arXiv Detail & Related papers (2020-06-20T12:10:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.