Cooperative Multi-agent Bandits: Distributed Algorithms with Optimal
Individual Regret and Constant Communication Costs
- URL: http://arxiv.org/abs/2308.04314v1
- Date: Tue, 8 Aug 2023 15:02:50 GMT
- Title: Cooperative Multi-agent Bandits: Distributed Algorithms with Optimal
Individual Regret and Constant Communication Costs
- Authors: Lin Yang, Xuchuang Wang, Mohammad Hajiesmaili, Lijun Zhang, John C.S.
Lui, Don Towsley
- Abstract summary: This paper presents a simple yet effective communication policy and integrates it into a learning algorithm for cooperative bandits.
Our algorithm achieves the best of both paradigms: optimal individual regret and constant communication costs.
- Score: 46.068883750876886
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, there has been extensive study of cooperative multi-agent
multi-armed bandits where a set of distributed agents cooperatively play the
same multi-armed bandit game. The goal is to develop bandit algorithms with the
optimal group and individual regrets and low communication between agents. The
prior work tackled this problem using two paradigms: leader-follower and fully
distributed algorithms. Prior algorithms in both paradigms achieve the optimal
group regret. The leader-follower algorithms achieve constant communication
costs but fail to achieve optimal individual regrets. The state-of-the-art
fully distributed algorithms achieve optimal individual regrets but fail to
achieve constant communication costs. This paper presents a simple yet
effective communication policy and integrates it into a learning algorithm for
cooperative bandits. Our algorithm achieves the best of both paradigms: optimal
individual regret and constant communication costs.
Related papers
- QuACK: A Multipurpose Queuing Algorithm for Cooperative $k$-Armed Bandits [5.530212768657544]
We study the cooperative $k$-armed bandit problem, where a network of $m$ agents collaborate to find the optimal action.
We provide a black-box reduction that allows us to extend any single-agent bandit algorithm to the multi-agent setting.
arXiv Detail & Related papers (2024-10-31T12:20:36Z) - Pure Exploration in Asynchronous Federated Bandits [57.02106627533004]
We study the federated pure exploration problem of multi-armed bandits and linear bandits, where $M$ agents cooperatively identify the best arm via communicating with the central server.
We propose the first asynchronous multi-armed bandit and linear bandit algorithms for pure exploration with fixed confidence.
arXiv Detail & Related papers (2023-10-17T06:04:00Z) - Collaborative Multi-Agent Heterogeneous Multi-Armed Bandits [24.590517939890788]
We study a new collaborative setting, consisting of $N$ agents such that each agent is learning one of $M$ multi-armed bandits.
We develop algorithms which facilitate collaboration between the agents under two scenarios.
arXiv Detail & Related papers (2023-05-30T06:35:49Z) - On Regret-optimal Cooperative Nonstochastic Multi-armed Bandits [7.23389716633927]
We show that a collaborative multi-agent emphfollow-the-regularized-leader (FTRL) algorithm has an individual regret upper bound that matches the lower bound up to a constant factor.
We also show that an FTRL algorithm with a suitable regularizer is regret optimal with respect to the scaling with the edge-delay parameter.
arXiv Detail & Related papers (2022-11-30T16:46:41Z) - Private and Byzantine-Proof Cooperative Decision-Making [15.609414012418043]
The cooperative bandit problem is a multi-agent decision problem involving a group of agents that interact simultaneously with a multi-armed bandit.
In this paper, we investigate the bandit problem under two settings - (a) when the agents wish to make their communication private with respect to the action sequence, and (b) when the agents can be byzantine.
We provide upper-confidence bound algorithms that obtain optimal regret while being (a) differentially-private and (b) private.
Our decentralized algorithms require no information about the network of connectivity between agents, making them scalable to large dynamic systems.
arXiv Detail & Related papers (2022-05-27T18:03:54Z) - Distributed Bandits with Heterogeneous Agents [38.90376765616447]
This paper tackles a multi-agent bandit setting where $M$ agents cooperate together to solve the same instance of a $K$-armed bandit problem.
We propose two learning algorithms, ucbo and AAE.
We prove that both algorithms achieve order-optimal regret, which is $Oleft(sum_i:tildeDelta_i>0 log T/tildeDelta_iright)$, where $tildeDelta_i$ is the minimum suboptimality gap between the reward mean of
arXiv Detail & Related papers (2022-01-23T20:04:15Z) - Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex
Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network.
We design two optimal algorithms that attain these lower bounds.
We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z) - Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit
Feedback [51.21673420940346]
Combinatorial bandits generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set.
We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set.
Based on a projection-free online learning algorithm for finite polytopes, it is the first computationally efficient algorithm which is convexally optimal and has competitive empirical performance.
arXiv Detail & Related papers (2021-01-21T10:35:09Z) - Upper Confidence Bounds for Combining Stochastic Bandits [52.10197476419621]
We provide a simple method to combine bandit algorithms.
Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem.
arXiv Detail & Related papers (2020-12-24T05:36:29Z) - Corralling Stochastic Bandit Algorithms [54.10645564702416]
We show that the regret of the corralling algorithms is no worse than that of the best algorithm containing the arm with the highest reward.
We show that the gap between the highest reward and other rewards depends on the gap between the highest reward and other rewards.
arXiv Detail & Related papers (2020-06-16T15:33:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.