Related papers: No-Regret Learning in Bilateral Trade via Global Budget Balance

No-Regret Learning in Bilateral Trade via Global Budget Balance

URL: http://arxiv.org/abs/2310.12370v2
Date: Wed, 27 Mar 2024 13:44:21 GMT
Title: No-Regret Learning in Bilateral Trade via Global Budget Balance
Authors: Martino Bernasconi, Matteo Castiglioni, Andrea Celli, Federico Fusco,
Abstract summary: We provide the first no-regret algorithms for adversarial bilateral trade under various feedback models. We show that in the full-feedback model, the learner can guarantee $tilde O(sqrtT)$ regret against the best fixed prices in hindsight. We also provide a learning algorithm guaranteeing a $tilde O(T3/4)$ regret upper bound with one-bit feedback.
Score: 29.514323697659613
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Bilateral trade models the problem of intermediating between two rational agents -- a seller and a buyer -- both characterized by a private valuation for an item they want to trade. We study the online learning version of the problem, in which at each time step a new seller and buyer arrive and the learner has to set prices for them without any knowledge about their (adversarially generated) valuations. In this setting, known impossibility results rule out the existence of no-regret algorithms when budget balanced has to be enforced at each time step. In this paper, we introduce the notion of \emph{global budget balance}, which only requires the learner to fulfill budget balance over the entire time horizon. Under this natural relaxation, we provide the first no-regret algorithms for adversarial bilateral trade under various feedback models. First, we show that in the full-feedback model, the learner can guarantee $\tilde O(\sqrt{T})$ regret against the best fixed prices in hindsight, and that this bound is optimal up to poly-logarithmic terms. Second, we provide a learning algorithm guaranteeing a $\tilde O(T^{3/4})$ regret upper bound with one-bit feedback, which we complement with a $\Omega(T^{5/7})$ lower bound that holds even in the two-bit feedback model. Finally, we introduce and analyze an alternative benchmark that is provably stronger than the best fixed prices in hindsight and is inspired by the literature on bandits with knapsacks.

Related papers

Improved learning rates in multi-unit uniform price auctions [20.8319469276025]
We study the problem of online learning in repeated multi-unit uniform price auctions focusing on the adversarial opposing bid setting. We prove that a learning algorithm leveraging the structure of this problem achieves a regret of $tildeO(K4/3T2/3)$ under bandit feedback. Inspired by electricity reserve markets, we introduce a different feedback model under which all winning bids are revealed.
arXiv Detail & Related papers (2025-01-17T13:26:12Z)
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM) We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary. We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z)
An Online Learning Theory of Brokerage [3.8059763597999012]
We investigate brokerage between traders from an online learning perspective. Unlike other bilateral trade problems already studied, we focus on the case where there are no designated buyer and seller roles. We show that the optimal rate degrades to $sqrtT$ in the first case, and the problem becomes unlearnable in the second.
arXiv Detail & Related papers (2023-10-18T17:01:32Z)
Online Learning in Contextual Second-Price Pay-Per-Click Auctions [47.06746975822902]
We study online learning in contextual pay-per-click auctions where at each of the $T$ rounds, the learner receives some context along with a set of ads. The learner's goal is to minimize her regret, defined as the gap between her total revenue and that of an oracle strategy.
arXiv Detail & Related papers (2023-10-08T07:04:22Z)
Borda Regret Minimization for Generalized Linear Dueling Bandits [65.09919504862496]
We study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score. We propose a rich class of generalized linear dueling bandit models, which cover many existing models. Our algorithm achieves an $tildeO(d2/3 T2/3)$ regret, which is also optimal.
arXiv Detail & Related papers (2023-03-15T17:59:27Z)
Repeated Bilateral Trade Against a Smoothed Adversary [5.939280057673226]
We study repeated bilateral trade where an adaptive $sigma$-smooth adversary generates the valuations of sellers and buyers. We provide a complete characterization of the regret regimes for fixed-price mechanisms under different feedback models.
arXiv Detail & Related papers (2023-02-21T16:30:10Z)
Autobidders with Budget and ROI Constraints: Efficiency, Regret, and Pacing Dynamics [53.62091043347035]
We study a game between autobidding algorithms that compete in an online advertising platform. We propose a gradient-based learning algorithm that is guaranteed to satisfy all constraints and achieves vanishing individual regret.
arXiv Detail & Related papers (2023-01-30T21:59:30Z)
A Reinforcement Learning Approach in Multi-Phase Second-Price Auction Design [158.0041488194202]
We study reserve price optimization in multi-phase second price auctions. From the seller's perspective, we need to efficiently explore the environment in the presence of potentially nontruthful bidders. Third, the seller's per-step revenue is unknown, nonlinear, and cannot even be directly observed from the environment.
arXiv Detail & Related papers (2022-10-19T03:49:05Z)
An $α$-regret analysis of Adversarial Bilateral Trade [10.275531964940425]
We study sequential bilateral trade where sellers and buyers valuations are completely arbitrary. We consider gain from trade which is harder to approximate than social welfare.
arXiv Detail & Related papers (2022-10-13T08:57:30Z)
The Best of Both Worlds: Reinforcement Learning with Logarithmic Regret and Policy Switches [84.54669549718075]
We study the problem of regret minimization for episodic Reinforcement Learning (RL) We focus on learning with general function classes and general model classes. We show that a logarithmic regret bound is realizable by algorithms with $O(log T)$ switching cost.
arXiv Detail & Related papers (2022-03-03T02:55:55Z)
A Regret Analysis of Bilateral Trade [5.031063690574698]
We cast for the first time the bilateral trade problem in a regret minimization framework over rounds of seller/buyer interactions. Our main contribution is a complete characterization of the regret regimes for fixedprice mechanisms with different models of feedback and private valuations.
arXiv Detail & Related papers (2021-02-16T08:53:17Z)
Regret Minimization in Stochastic Contextual Dueling Bandits [40.17224226373741]
We consider the problem of $K$-armed dueling bandit in the contextual setting. We present two algorithms for the setup with respective regret guarantees.
arXiv Detail & Related papers (2020-02-20T06:36:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.