Related papers: Near-Optimal Regret for Distributed Adversarial Bandits: A Black-Box Approach

Near-Optimal Regret for Distributed Adversarial Bandits: A Black-Box Approach

URL: http://arxiv.org/abs/2602.06404v1
Date: Fri, 06 Feb 2026 05:53:38 GMT
Title: Near-Optimal Regret for Distributed Adversarial Bandits: A Black-Box Approach
Authors: Hao Qiu, Mengxiao Zhang, Nicolò Cesa-Bianchi,
Abstract summary: We study distributed adversarial bandits, where $N$ agents cooperate to minimize the global average loss while observing only their own local losses.<n>We show that the minimax regret for this problem is $tilde(sqrt(-1/2+K/N)T)$, where $T$ is the horizon, $K$ is the number of actions, and $$ is the spectral gap of the communication matrix.
Score: 26.085126064745378
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study distributed adversarial bandits, where $N$ agents cooperate to minimize the global average loss while observing only their own local losses. We show that the minimax regret for this problem is $\tildeΘ(\sqrt{(ρ^{-1/2}+K/N)T})$, where $T$ is the horizon, $K$ is the number of actions, and $ρ$ is the spectral gap of the communication matrix. Our algorithm, based on a novel black-box reduction to bandits with delayed feedback, requires agents to communicate only through gossip. It achieves an upper bound that significantly improves over the previous best bound $\tilde{O}(ρ^{-1/3}(KT)^{2/3})$ of Yi and Vojnovic (2023). We complement this result with a matching lower bound, showing that the problem's difficulty decomposes into a communication cost $ρ^{-1/4}\sqrt{T}$ and a bandit cost $\sqrt{KT/N}$. We further demonstrate the versatility of our approach by deriving first-order and best-of-both-worlds bounds in the distributed adversarial setting. Finally, we extend our framework to distributed linear bandits in $R^d$, obtaining a regret bound of $\tilde{O}(\sqrt{(ρ^{-1/2}+1/N)dT})$, achieved with only $O(d)$ communication cost per agent and per round via a volumetric spanner.

Related papers

Distributed Online Convex Optimization with Efficient Communication: Improved Algorithm and Lower bounds [27.851263935083736]
We investigate distributed online convex optimization with compressed communication.<n>We propose a novel algorithm that achieves improved regret bounds of $tildeO(-1/2-1nsqrtT)$ and $tildeO(-1-2nlnT)$ for convex and strongly convex functions.
arXiv Detail & Related papers (2026-01-08T13:05:36Z)
Distributed Multi-Agent Bandits Over Erdős-Rényi Random Networks [16.410227280444285]
We study the distributed multi-agent multi-armed bandit problem with heterogeneous rewards over random communication graphs.<n>We propose a fully distributed algorithm that integrates the arm elimination strategy with the random gossip algorithm.
arXiv Detail & Related papers (2025-10-26T19:53:52Z)
Bandit Max-Min Fair Allocation [30.8580020414087]
We study a new decision-making problem called the bandit max-min fair allocation (BMMFA) problem.<n>The goal of this problem is to maximize the minimum utility among agents with additive valuations.<n>One key feature of this problem is that each agent's valuation for each item can only be observed through the semi-bandit feedback.
arXiv Detail & Related papers (2025-05-08T12:09:20Z)
Low-rank Matrix Bandits with Heavy-tailed Rewards [55.03293214439741]
We study the problem of underlinelow-rank matrix bandit with underlineheavy-underlinetailed underlinerewards (LowHTR) By utilizing the truncation on observed payoffs and the dynamic exploration, we propose a novel algorithm called LOTUS.
arXiv Detail & Related papers (2024-04-26T21:54:31Z)
Adversarial Combinatorial Bandits with Switching Costs [55.2480439325792]
We study the problem of adversarial bandit with a switching cost $lambda$ for a switch of each selected arm in each round. We derive lower bounds for the minimax regret and design algorithms to approach them.
arXiv Detail & Related papers (2024-04-02T12:15:37Z)
Best-of-Both-Worlds Linear Contextual Bandits [45.378265414553226]
This study investigates the problem of $K$-armed linear contextual bandits, an instance of the multi-armed bandit problem, under an adversarial corruption. We develop a strategy that is effective in both adversarial environments, with theoretical guarantees. We refer to our strategy as the Best-of-Both-Worlds (BoBW) RealFTRL, due to its theoretical guarantees in both adversarial regimes.
arXiv Detail & Related papers (2023-12-27T09:32:18Z)
Federated Linear Bandits with Finite Adversarial Actions [20.1041278044797]
We study a federated linear bandits model, where $M$ clients communicate with a central server to solve a linear contextual bandits problem. To address the unique challenges of adversarial finite action sets, we propose the FedSupLinUCB algorithm. We prove that FedSupLinUCB achieves a total regret of $tildeO(sqrtd T)$, where $T$ is the total number of arm pulls from all clients, and $d$ is the ambient dimension of the linear model.
arXiv Detail & Related papers (2023-11-02T03:41:58Z)
Context-lumpable stochastic bandits [49.024050919419366]
We consider a contextual bandit problem with $S$ contexts and $K$ actions. We give an algorithm that outputs an $epsilon$-optimal policy after using at most $widetilde O(r (S +K )/epsilon2)$ samples. In the regret setting, we give an algorithm whose cumulative regret up to time $T$ is bounded by $widetilde O(sqrtr3(S+K)T)$.
arXiv Detail & Related papers (2023-06-22T17:20:30Z)
Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR [58.40575099910538]
We study risk-sensitive Reinforcement Learning (RL), focusing on the objective of Conditional Value at Risk (CVaR) with risk tolerance $tau$. We show the minimax CVaR regret rate is $Omega(sqrttau-1AK)$, where $A$ is the number of actions and $K$ is the number of episodes. We show that our algorithm achieves the optimal regret of $widetilde O(tau-1sqrtSAK)$ under a continuity assumption and in general attains a near
arXiv Detail & Related papers (2023-02-07T02:22:31Z)
Corralling a Larger Band of Bandits: A Case Study on Switching Regret for Linear Bandits [99.86860277006318]
We consider the problem of combining and learning over a set of adversarial algorithms with the goal of adaptively tracking the best one on the fly. The CORRAL of Agarwal et al. achieves this goal with a regret overhead of order $widetildeO(sqrtd S T)$ where $M$ is the number of base algorithms and $T$ is the time horizon. Motivated by this issue, we propose a new recipe to corral a larger band of bandit algorithms whose regret overhead has only emphlogarithmic dependence on $M$ as long
arXiv Detail & Related papers (2022-02-12T21:55:44Z)
Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies. We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.