Related papers: Better Regret Rates in Bilateral Trade via Sublinear Budget Violation

Better Regret Rates in Bilateral Trade via Sublinear Budget Violation

URL: http://arxiv.org/abs/2507.11419v1
Date: Tue, 15 Jul 2025 15:45:36 GMT
Title: Better Regret Rates in Bilateral Trade via Sublinear Budget Violation
Authors: Anna Lunghi, Matteo Castiglioni, Alberto Marchesi,
Abstract summary: We study how the optimal regret rate varies with the allowed violation of the global budget balance constraint.<n>We complement this result with a matching lower bound, thus fully characterizing the trade-off between regret and budget violation.
Score: 28.647867626280316
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Bilateral trade is a central problem in algorithmic economics, and recent work has explored how to design trading mechanisms using no-regret learning algorithms. However, no-regret learning is impossible when budget balance has to be enforced at each time step. Bernasconi et al. [Ber+24] show how this impossibility can be circumvented by relaxing the budget balance constraint to hold only globally over all time steps. In particular, they design an algorithm achieving regret of the order of $\tilde O(T^{3/4})$ and provide a lower bound of $\Omega(T^{5/7})$. In this work, we interpolate between these two extremes by studying how the optimal regret rate varies with the allowed violation of the global budget balance constraint. Specifically, we design an algorithm that, by violating the constraint by at most $T^{\beta}$ for any given $\beta \in [\frac{3}{4}, \frac{6}{7}]$, attains regret $\tilde O(T^{1 - \beta/3})$. We complement this result with a matching lower bound, thus fully characterizing the trade-off between regret and budget violation. Our results show that both the $\tilde O(T^{3/4})$ upper bound in the global budget balance case and the $\Omega(T^{5/7})$ lower bound under unconstrained budget balance violation obtained by Bernasconi et al. [Ber+24] are tight.

Related papers

Near-Optimal Regret for KL-Regularized Multi-Armed Bandits [54.77408659142336]
We study the statistical efficiency of online learning with respect to KL-regularized objectives.<n>We show that the KL-regularized regret for MABs is $$-independent and scales as $tilde(sqrtKT)$.
arXiv Detail & Related papers (2026-03-02T18:17:33Z)
Tight Regret Bounds for Fixed-Price Bilateral Trade [10.237210183229555]
We show a near-optimal $widetildeTheta(T2/3)$ tight bound for $textsfGlobal Budget Balance$ fixed-price mechanisms with two-bit/one-bit feedback.<n>For correlated/adversarial values, a near-optimal $Omega(T3/4)$ lower bound for $textsfGlobal Budget Balance$ fixed-price mechanisms with two-bit/one-bit feedback.
arXiv Detail & Related papers (2025-04-06T03:56:42Z)
p-Mean Regret for Stochastic Bandits [52.828710025519996]
We introduce a simple, unified UCB-based algorithm that achieves novel $p$-mean regret bounds.<n>Our framework encompasses both average cumulative regret and Nash regret as special cases.
arXiv Detail & Related papers (2024-12-14T08:38:26Z)
Adversarial Combinatorial Bandits with Switching Costs [55.2480439325792]
We study the problem of adversarial bandit with a switching cost $lambda$ for a switch of each selected arm in each round. We derive lower bounds for the minimax regret and design algorithms to approach them.
arXiv Detail & Related papers (2024-04-02T12:15:37Z)
Improved Regret for Bandit Convex Optimization with Delayed Feedback [50.46856739179311]
bandit convex optimization (BCO) with delayed feedback, where only the loss value of the action is revealed under a delay. We develop a novel algorithm, and prove that it enjoys a regret bound of $O(sqrtnT3/4+sqrtdT)$ in general. We show that the proposed algorithm can improve the regret bound to $O((nT)2/3log/3T+dlog T)$ for strongly convex functions.
arXiv Detail & Related papers (2024-02-14T13:08:26Z)
Federated Linear Bandits with Finite Adversarial Actions [20.1041278044797]
We study a federated linear bandits model, where $M$ clients communicate with a central server to solve a linear contextual bandits problem. To address the unique challenges of adversarial finite action sets, we propose the FedSupLinUCB algorithm. We prove that FedSupLinUCB achieves a total regret of $tildeO(sqrtd T)$, where $T$ is the total number of arm pulls from all clients, and $d$ is the ambient dimension of the linear model.
arXiv Detail & Related papers (2023-11-02T03:41:58Z)
No-Regret Learning in Bilateral Trade via Global Budget Balance [29.514323697659613]
We provide the first no-regret algorithms for adversarial bilateral trade under various feedback models. We show that in the full-feedback model, the learner can guarantee $tilde O(sqrtT)$ regret against the best fixed prices in hindsight. We also provide a learning algorithm guaranteeing a $tilde O(T3/4)$ regret upper bound with one-bit feedback.
arXiv Detail & Related papers (2023-10-18T22:34:32Z)
Borda Regret Minimization for Generalized Linear Dueling Bandits [65.09919504862496]
We study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score. We propose a rich class of generalized linear dueling bandit models, which cover many existing models. Our algorithm achieves an $tildeO(d2/3 T2/3)$ regret, which is also optimal.
arXiv Detail & Related papers (2023-03-15T17:59:27Z)
Better Best of Both Worlds Bounds for Bandits with Switching Costs [37.71741656687868]
We study best-of-both-worlds algorithms for bandits with switching cost, recently addressed by Rouyer, Seldin and Cesa-Bianchi, 2021. We introduce a surprisingly simple and effective that simultaneously achieves minimax optimal regret bound of $mathcalO(T2/3)$ in the oblivious adversarial setting.
arXiv Detail & Related papers (2022-06-07T08:22:56Z)
The Best of Both Worlds: Reinforcement Learning with Logarithmic Regret and Policy Switches [84.54669549718075]
We study the problem of regret minimization for episodic Reinforcement Learning (RL) We focus on learning with general function classes and general model classes. We show that a logarithmic regret bound is realizable by algorithms with $O(log T)$ switching cost.
arXiv Detail & Related papers (2022-03-03T02:55:55Z)
Fine-Grained Gap-Dependent Bounds for Tabular MDPs via Adaptive Multi-Step Bootstrap [84.66885506098724]
This paper presents a new model-free algorithm for episodic finite-horizon Markov Decision Processes (MDP), Adaptive Multi-step Bootstrap (AMB) We show AMB achieves a gap-dependent regret bound that only scales with the sum of the inverse of the sub-optimality gaps. We also show AMB suffers an additional $frac|Z_mul|Delta_min$ regret, where $Z_mul$ is the set of state-action pairs $(s,a)$'s satisfying $a$ is a non-unique optimal action for
arXiv Detail & Related papers (2021-02-09T07:46:34Z)
Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints. For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z)
Near-optimal Regret Bounds for Stochastic Shortest Path [63.029132134792555]
shortest path (SSP) is a well-known problem in planning and control, in which an agent has to reach a goal state in minimum total expected cost. We show that any learning algorithm must have at least $Omega(B_star sqrt|S| |A| K)$ regret in the worst case.
arXiv Detail & Related papers (2020-02-23T09:10:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.