Optimal No-regret Learning in Repeated First-price Auctions
- URL: http://arxiv.org/abs/2003.09795v7
- Date: Mon, 4 Mar 2024 23:27:02 GMT
- Title: Optimal No-regret Learning in Repeated First-price Auctions
- Authors: Yanjun Han, Zhengyuan Zhou, Tsachy Weissman
- Abstract summary: We study online learning in repeated first-price auctions.
We develop the first learning algorithm that achieves a near-optimal $widetildeO(sqrtT)$ regret bound.
- Score: 38.908235632001116
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study online learning in repeated first-price auctions where a bidder,
only observing the winning bid at the end of each auction, learns to adaptively
bid in order to maximize her cumulative payoff. To achieve this goal, the
bidder faces censored feedback: if she wins the bid, then she is not able to
observe the highest bid of the other bidders, which we assume is \textit{iid}
drawn from an unknown distribution. In this paper, we develop the first
learning algorithm that achieves a near-optimal $\widetilde{O}(\sqrt{T})$
regret bound, by exploiting two structural properties of first-price auctions,
i.e. the specific feedback structure and payoff function.
We first formulate the feedback structure in first-price auctions as
partially ordered contextual bandits, a combination of the graph feedback
across actions (bids), the cross learning across contexts (private values), and
a partial order over the contexts. We establish both strengths and weaknesses
of this framework, by showing a curious separation that a regret nearly
independent of the action/context sizes is possible under stochastic contexts,
but is impossible under adversarial contexts. In particular, this framework
leads to an $O(\sqrt{T}\log^{2.5}T)$ regret for first-price auctions when the
bidder's private values are \emph{iid}.
Despite the limitation of the above framework, we further exploit the special
payoff function of first-price auctions to develop a sample-efficient algorithm
even in the presence of adversarially generated private values. We establish an
$O(\sqrt{T}\log^3 T)$ regret bound for this algorithm, hence providing a
complete characterization of optimal learning guarantees for first-price
auctions.
Related papers
- Randomized Truthful Auctions with Learning Agents [10.39657928150242]
We study a setting where agents use no-regret learning to participate in repeated auctions.
We show that when bidders participate in second-price auctions using no-regret bidding algorithms, the runner-up bidder may not converge to bidding truthfully.
We define a notion of em auctioneer regret comparing the revenue generated to the revenue of a second price auction with bids.
arXiv Detail & Related papers (2024-11-14T15:28:40Z) - Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM)
We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary.
We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z) - Online Learning in Contextual Second-Price Pay-Per-Click Auctions [47.06746975822902]
We study online learning in contextual pay-per-click auctions where at each of the $T$ rounds, the learner receives some context along with a set of ads.
The learner's goal is to minimize her regret, defined as the gap between her total revenue and that of an oracle strategy.
arXiv Detail & Related papers (2023-10-08T07:04:22Z) - Autobidders with Budget and ROI Constraints: Efficiency, Regret, and Pacing Dynamics [53.62091043347035]
We study a game between autobidding algorithms that compete in an online advertising platform.
We propose a gradient-based learning algorithm that is guaranteed to satisfy all constraints and achieves vanishing individual regret.
arXiv Detail & Related papers (2023-01-30T21:59:30Z) - A Reinforcement Learning Approach in Multi-Phase Second-Price Auction
Design [158.0041488194202]
We study reserve price optimization in multi-phase second price auctions.
From the seller's perspective, we need to efficiently explore the environment in the presence of potentially nontruthful bidders.
Third, the seller's per-step revenue is unknown, nonlinear, and cannot even be directly observed from the environment.
arXiv Detail & Related papers (2022-10-19T03:49:05Z) - No-regret Learning in Repeated First-Price Auctions with Budget
Constraints [5.834615090865286]
We propose an RL-based bidding algorithm against the optimal non-anticipating strategy under stationary competition.
Our algorithm obtains $widetilde O(sqrt T)$-regret if the bids are all revealed at the end of each round.
arXiv Detail & Related papers (2022-05-29T04:32:05Z) - Fast Rate Learning in Stochastic First Price Bidding [0.0]
First-price auctions have largely replaced traditional bidding approaches based on Vickrey auctions in programmatic advertising.
We show how to achieve significantly lower regret when the opponents' maximal bid distribution is known.
Our algorithms converge much faster than alternatives proposed in the literature for various bid distributions.
arXiv Detail & Related papers (2021-07-05T07:48:52Z) - Efficient Algorithms for Stochastic Repeated Second-price Auctions [0.0]
We develop efficient sequential bidding strategies for repeated auctions.
We provide the first parametric lower bound for this problem.
We propose more explainable strategies which are reminiscent of the Explore Then Commit bandit algorithm.
arXiv Detail & Related papers (2020-11-10T12:45:02Z) - ProportionNet: Balancing Fairness and Revenue for Auction Design with
Deep Learning [55.76903822619047]
We study the design of revenue-maximizing auctions with strong incentive guarantees.
We extend techniques for approximating auctions using deep learning to address concerns of fairness while maintaining high revenue and strong incentive guarantees.
arXiv Detail & Related papers (2020-10-13T13:54:21Z) - Learning to Bid Optimally and Efficiently in Adversarial First-price
Auctions [40.30925727499806]
We develop the first minimax optimal online bidding algorithm that achieves an $widetildeO(sqrtT)$ regret.
We test our algorithm on three real-world first-price auction datasets obtained from Verizon Media.
arXiv Detail & Related papers (2020-07-09T05:40:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.