Asymptotic Optimality for Decentralised Bandits
- URL: http://arxiv.org/abs/2109.09427v1
- Date: Mon, 20 Sep 2021 11:10:10 GMT
- Title: Asymptotic Optimality for Decentralised Bandits
- Authors: Conor Newton, Ayalvadi Ganesh and Henry W. J. Reeve
- Abstract summary: We consider a large number of agents collaborating on a bandit problem with a large number of arms.
The goal is to minimise the regret of each agent in a communication-constrained setting.
- Score: 7.057937612386993
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider a large number of agents collaborating on a multi-armed bandit
problem with a large number of arms. The goal is to minimise the regret of each
agent in a communication-constrained setting. We present a decentralised
algorithm which builds upon and improves the Gossip-Insert-Eliminate method of
Chawla et al. arxiv:2001.05452. We provide a theoretical analysis of the regret
incurred which shows that our algorithm is asymptotically optimal. In fact, our
regret guarantee matches the asymptotically optimal rate achievable in the full
communication setting. Finally, we present empirical results which support our
conclusions
Related papers
- Optimal Multi-Fidelity Best-Arm Identification [65.23078799972188]
In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible.
We study multi-fidelity best-arm identification, in which the can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost.
Several methods have been proposed for tackling this problem, but their optimality remain elusive, notably due to loose lower bounds on the total cost needed to identify the best arm.
arXiv Detail & Related papers (2024-06-05T08:02:40Z) - Multi-Armed Bandits with Abstention [62.749500564313834]
We introduce a novel extension of the canonical multi-armed bandit problem that incorporates an additional strategic element: abstention.
In this enhanced framework, the agent is not only tasked with selecting an arm at each time step, but also has the option to abstain from accepting the instantaneous reward before observing it.
arXiv Detail & Related papers (2024-02-23T06:27:12Z) - On Regret-optimal Cooperative Nonstochastic Multi-armed Bandits [7.23389716633927]
We show that a collaborative multi-agent emphfollow-the-regularized-leader (FTRL) algorithm has an individual regret upper bound that matches the lower bound up to a constant factor.
We also show that an FTRL algorithm with a suitable regularizer is regret optimal with respect to the scaling with the edge-delay parameter.
arXiv Detail & Related papers (2022-11-30T16:46:41Z) - Federated Best Arm Identification with Heterogeneous Clients [62.36929749450298]
We study best arm identification in a federated multi-armed bandit setting with a central server and multiple clients.
We show that for any algorithm whose upper bound on the expected stopping time matches with the lower bound up to a multiplicative constant (em almost-optimal algorithm)
We propose a novel algorithm that communicates at exponential time instants, and demonstrate that it is almost-optimal.
arXiv Detail & Related papers (2022-10-14T13:09:11Z) - Decentralized Multi-Armed Bandit Can Outperform Classic Upper Confidence Bound: A Homogeneous Case over Strongly Connected Graphs [9.84486119211443]
This paper studies a homogeneous decentralized multi-armed bandit problem, in which a network of multiple agents faces the same set of arms.
A fully decentralized upper bound confidence (UCB) algorithm is proposed for a multi-agent network whose neighbor relations are described by a directed graph.
arXiv Detail & Related papers (2021-11-22T01:05:52Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - Bayesian Algorithms for Decentralized Stochastic Bandits [12.350564981588063]
We study a decentralized cooperative multi-agent multi-armed bandit problem with $K$ arms and $N$ agents connected over a network.
In our model, each arm's reward distribution is same for all agents, and rewards are drawn independently across agents and over time steps.
The goal is to minimize cumulative regret averaged over the entire network.
arXiv Detail & Related papers (2020-10-20T19:14:20Z) - Lenient Regret for Multi-Armed Bandits [72.56064196252498]
We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and observes rewards for the actions it took.
While the majority of algorithms try to minimize the regret, i.e., the cumulative difference between the reward of the best action and the agent's action, this criterion might lead to undesirable results.
We suggest a new, more lenient, regret criterion that ignores suboptimality gaps smaller than some $epsilon$.
arXiv Detail & Related papers (2020-08-10T08:30:52Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.