Distributed Bandits: Probabilistic Communication on $d$-regular Graphs
- URL: http://arxiv.org/abs/2011.07720v2
- Date: Sat, 9 Oct 2021 00:51:56 GMT
- Title: Distributed Bandits: Probabilistic Communication on $d$-regular Graphs
- Authors: Udari Madhushani and Naomi Ehrich Leonard
- Abstract summary: We study the decentralized multi-agent multi-armed bandit problem for agents that communicate with probability over a network defined by a $d$-regular graph.
We propose a new Upper Confidence Bound (UCB) based algorithm and analyze how agent-based strategies contribute to minimizing group regret.
- Score: 5.33024001730262
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the decentralized multi-agent multi-armed bandit problem for agents
that communicate with probability over a network defined by a $d$-regular
graph. Every edge in the graph has probabilistic weight $p$ to account for the
($1\!-\!p$) probability of a communication link failure. At each time step,
each agent chooses an arm and receives a numerical reward associated with the
chosen arm. After each choice, each agent observes the last obtained reward of
each of its neighbors with probability $p$. We propose a new Upper Confidence
Bound (UCB) based algorithm and analyze how agent-based strategies contribute
to minimizing group regret in this probabilistic communication setting. We
provide theoretical guarantees that our algorithm outperforms state-of-the-art
algorithms. We illustrate our results and validate the theoretical claims using
numerical simulations.
Related papers
- Cooperative Multi-Agent Constrained Stochastic Linear Bandits [2.099922236065961]
A network of $N$ agents communicate locally to minimize their collective regret while keeping their expected cost under a specified threshold $tau$.
We propose a safe distributed upper confidence bound algorithm, so called textitMA-OPLB, and establish a high probability bound on its $T$-round regret.
We show that our regret bound is of order $ mathcalOleft(fracdtau-c_0fraclog(NT)2sqrtNsqrtTlog (1/|lambda|)
arXiv Detail & Related papers (2024-10-22T19:34:53Z) - A General Framework for Clustering and Distribution Matching with Bandit Feedback [81.50716021326194]
We develop a general framework for clustering and distribution matching problems with bandit feedback.
We derive a non-asymptotic lower bound on the average number of arm pulls for any online algorithm with an error probability not exceeding $delta$.
arXiv Detail & Related papers (2024-09-08T12:19:12Z) - Federated Combinatorial Multi-Agent Multi-Armed Bandits [79.1700188160944]
This paper introduces a federated learning framework tailored for online optimization with bandit.
In this setting, agents subsets of arms, observe noisy rewards for these subsets without accessing individual arm information, and can cooperate and share information at specific intervals.
arXiv Detail & Related papers (2024-05-09T17:40:09Z) - Cooperative Multi-Agent Graph Bandits: UCB Algorithm and Regret Analysis [5.02063914741425]
We formulate the multi-agent graph bandit problem as a multi-agent extension of the graph bandit problem introduced by Zhang, Johansson, and Li.
We propose an Upper Confidence Bound (UCB)-based learning algorithm, Multi-G-UCB, and prove that its expected regret over $T$ steps is bounded by $O(gamma Nlog(T)[sqrtKT + DK])$.
arXiv Detail & Related papers (2024-01-18T21:36:17Z) - Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward.
We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy.
We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z) - Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with
Heterogeneous Rewards [14.822625665220068]
We study a decentralized multi-agent multi-armed bandit problem in which multiple clients are connected by time dependent random graphs provided by an environment.
The goal is to minimize the overall regret of the entire system through collaborations.
arXiv Detail & Related papers (2023-06-08T22:37:24Z) - Federated Best Arm Identification with Heterogeneous Clients [62.36929749450298]
We study best arm identification in a federated multi-armed bandit setting with a central server and multiple clients.
We show that for any algorithm whose upper bound on the expected stopping time matches with the lower bound up to a multiplicative constant (em almost-optimal algorithm)
We propose a novel algorithm that communicates at exponential time instants, and demonstrate that it is almost-optimal.
arXiv Detail & Related papers (2022-10-14T13:09:11Z) - QuTE: decentralized multiple testing on sensor networks with false
discovery rate control [130.7122910646076]
This paper designs methods for decentralized multiple hypothesis testing on graphs equipped with provable guarantees on the false discovery rate (FDR)
We consider the setting where distinct agents reside on the nodes of an undirected graph, and each agent possesses p-values corresponding to one or more hypotheses local to its node.
Each agent must individually decide whether to reject one or more of its local hypotheses by only communicating with its neighbors, with the joint aim that the global FDR over the entire graph must be controlled at a predefined level.
arXiv Detail & Related papers (2022-10-09T19:48:39Z) - Collaborative Multi-agent Stochastic Linear Bandits [28.268809091816287]
We study a collaborative multi-agent linear bandit setting, where $N$ agents that form a network communicate locally to minimize their overall regret.
All the agents observe the corresponding rewards of the played actions and use an accelerated consensus procedure to compute an estimate of the average of the rewards obtained by all the agents.
arXiv Detail & Related papers (2022-05-12T19:46:35Z) - Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback.
It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines.
We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z) - Bayesian Algorithms for Decentralized Stochastic Bandits [12.350564981588063]
We study a decentralized cooperative multi-agent multi-armed bandit problem with $K$ arms and $N$ agents connected over a network.
In our model, each arm's reward distribution is same for all agents, and rewards are drawn independently across agents and over time steps.
The goal is to minimize cumulative regret averaged over the entire network.
arXiv Detail & Related papers (2020-10-20T19:14:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.