Improved learning rates in multi-unit uniform price auctions
- URL: http://arxiv.org/abs/2501.10181v1
- Date: Fri, 17 Jan 2025 13:26:12 GMT
- Title: Improved learning rates in multi-unit uniform price auctions
- Authors: Marius Potfer, Dorian Baudry, Hugo Richard, Vianney Perchet, Cheng Wan,
- Abstract summary: We study the problem of online learning in repeated multi-unit uniform price auctions focusing on the adversarial opposing bid setting.
We prove that a learning algorithm leveraging the structure of this problem achieves a regret of $tildeO(K4/3T2/3)$ under bandit feedback.
Inspired by electricity reserve markets, we introduce a different feedback model under which all winning bids are revealed.
- Score: 20.8319469276025
- License:
- Abstract: Motivated by the strategic participation of electricity producers in electricity day-ahead market, we study the problem of online learning in repeated multi-unit uniform price auctions focusing on the adversarial opposing bid setting. The main contribution of this paper is the introduction of a new modeling of the bid space. Indeed, we prove that a learning algorithm leveraging the structure of this problem achieves a regret of $\tilde{O}(K^{4/3}T^{2/3})$ under bandit feedback, improving over the bound of $\tilde{O}(K^{7/4}T^{3/4})$ previously obtained in the literature. This improved regret rate is tight up to logarithmic terms. Inspired by electricity reserve markets, we further introduce a different feedback model under which all winning bids are revealed. This feedback interpolates between the full-information and bandit scenarios depending on the auctions' results. We prove that, under this feedback, the algorithm that we propose achieves regret $\tilde{O}(K^{5/2}\sqrt{T})$.
Related papers
- Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM)
We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary.
We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z) - No-Regret Learning in Bilateral Trade via Global Budget Balance [29.514323697659613]
We provide the first no-regret algorithms for adversarial bilateral trade under various feedback models.
We show that in the full-feedback model, the learner can guarantee $tilde O(sqrtT)$ regret against the best fixed prices in hindsight.
We also provide a learning algorithm guaranteeing a $tilde O(T3/4)$ regret upper bound with one-bit feedback.
arXiv Detail & Related papers (2023-10-18T22:34:32Z) - Learning in Repeated Multi-Unit Pay-As-Bid Auctions [3.6294895527930504]
We study the problem of bidding strategies in pay-as-bid (PAB) auctions from the perspective of single bidder.
We show that a utility trick enables a time algorithm to solve the offline problem where competing bids are known in advance.
We also present additional findings on the characterization of PAB equilibria.
arXiv Detail & Related papers (2023-07-27T20:49:28Z) - Learning and Collusion in Multi-unit Auctions [17.727436775513368]
We consider repeated multi-unit auctions with uniform pricing.
We analyze the properties of this auction in both the offline and online settings.
We show that the $(K+1)$-st price format is susceptible to collusion among the bidders.
arXiv Detail & Related papers (2023-05-27T08:00:49Z) - Repeated Bilateral Trade Against a Smoothed Adversary [5.939280057673226]
We study repeated bilateral trade where an adaptive $sigma$-smooth adversary generates the valuations of sellers and buyers.
We provide a complete characterization of the regret regimes for fixed-price mechanisms under different feedback models.
arXiv Detail & Related papers (2023-02-21T16:30:10Z) - Autobidders with Budget and ROI Constraints: Efficiency, Regret, and Pacing Dynamics [53.62091043347035]
We study a game between autobidding algorithms that compete in an online advertising platform.
We propose a gradient-based learning algorithm that is guaranteed to satisfy all constraints and achieves vanishing individual regret.
arXiv Detail & Related papers (2023-01-30T21:59:30Z) - A Reinforcement Learning Approach in Multi-Phase Second-Price Auction
Design [158.0041488194202]
We study reserve price optimization in multi-phase second price auctions.
From the seller's perspective, we need to efficiently explore the environment in the presence of potentially nontruthful bidders.
Third, the seller's per-step revenue is unknown, nonlinear, and cannot even be directly observed from the environment.
arXiv Detail & Related papers (2022-10-19T03:49:05Z) - Online Markov Decision Processes with Aggregate Bandit Feedback [74.85532145498742]
We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics.
In each episode, the learner suffers the loss accumulated along the trajectory realized by the policy chosen for the episode, and observes aggregate bandit feedback.
Our main result is a computationally efficient algorithm with $O(sqrtK)$ regret for this setting, where $K$ is the number of episodes.
arXiv Detail & Related papers (2021-01-31T16:49:07Z) - ProportionNet: Balancing Fairness and Revenue for Auction Design with
Deep Learning [55.76903822619047]
We study the design of revenue-maximizing auctions with strong incentive guarantees.
We extend techniques for approximating auctions using deep learning to address concerns of fairness while maintaining high revenue and strong incentive guarantees.
arXiv Detail & Related papers (2020-10-13T13:54:21Z) - Almost Optimal Model-Free Reinforcement Learning via Reference-Advantage
Decomposition [59.34067736545355]
We study the reinforcement learning problem in finite-horizon episodic Markov Decision Processes (MDPs) with $S$ states, $A$ actions, and episode length $H$.
We propose a model-free algorithm UCB-Advantage and prove that it achieves $tildeO(sqrtH2SAT)$ regret where $T = KH$ and $K$ is the number of episodes to play.
arXiv Detail & Related papers (2020-04-21T14:00:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.