Cooperative Multi-Agent Graph Bandits: UCB Algorithm and Regret Analysis
- URL: http://arxiv.org/abs/2401.10383v2
- Date: Sun, 17 Mar 2024 17:04:03 GMT
- Title: Cooperative Multi-Agent Graph Bandits: UCB Algorithm and Regret Analysis
- Authors: Phevos Paschalidis, Runyu Zhang, Na Li,
- Abstract summary: We formulate the multi-agent graph bandit problem as a multi-agent extension of the graph bandit problem introduced by Zhang, Johansson, and Li.
We propose an Upper Confidence Bound (UCB)-based learning algorithm, Multi-G-UCB, and prove that its expected regret over $T$ steps is bounded by $O(gamma Nlog(T)[sqrtKT + DK])$.
- Score: 5.02063914741425
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we formulate the multi-agent graph bandit problem as a multi-agent extension of the graph bandit problem introduced by Zhang, Johansson, and Li [CISS 57, 1-6 (2023)]. In our formulation, $N$ cooperative agents travel on a connected graph $G$ with $K$ nodes. Upon arrival at each node, agents observe a random reward drawn from a node-dependent probability distribution. The reward of the system is modeled as a weighted sum of the rewards the agents observe, where the weights capture some transformation of the reward associated with multiple agents sampling the same node at the same time. We propose an Upper Confidence Bound (UCB)-based learning algorithm, Multi-G-UCB, and prove that its expected regret over $T$ steps is bounded by $O(\gamma N\log(T)[\sqrt{KT} + DK])$, where $D$ is the diameter of graph $G$ and $\gamma$ a boundedness parameter associated with the weight functions. Lastly, we numerically test our algorithm by comparing it to alternative methods.
Related papers
- Cooperative Multi-Agent Constrained Stochastic Linear Bandits [2.099922236065961]
A network of $N$ agents communicate locally to minimize their collective regret while keeping their expected cost under a specified threshold $tau$.
We propose a safe distributed upper confidence bound algorithm, so called textitMA-OPLB, and establish a high probability bound on its $T$-round regret.
We show that our regret bound is of order $ mathcalOleft(fracdtau-c_0fraclog(NT)2sqrtNsqrtTlog (1/|lambda|)
arXiv Detail & Related papers (2024-10-22T19:34:53Z) - Federated Combinatorial Multi-Agent Multi-Armed Bandits [79.1700188160944]
This paper introduces a federated learning framework tailored for online optimization with bandit.
In this setting, agents subsets of arms, observe noisy rewards for these subsets without accessing individual arm information, and can cooperate and share information at specific intervals.
arXiv Detail & Related papers (2024-05-09T17:40:09Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - Fair Multi-Agent Bandits [14.614647884175657]
We provide an algorithm with regret $Oleft(N3 log fracBDelta f(log T) log T right)$, where $f(t)$ is any function diverging to infinity with $t$.
This significantly improves previous results which had the same upper bound on the regret of order $O(f(log T) log T )$ but an exponential dependence on the number of agents.
arXiv Detail & Related papers (2023-06-07T15:05:53Z) - Communication-Constrained Bandits under Additive Gaussian Noise [111.06688156723018]
We study a distributed multi-armed bandit where a client supplies the learner with communication-constrained feedback.
We propose a multi-phase bandit algorithm, $mathttUEtext-UCB++$, which matches this lower bound to a minor additive factor.
arXiv Detail & Related papers (2023-04-25T09:31:20Z) - On the Sample Complexity of Representation Learning in Multi-task
Bandits with Global and Local structure [77.60508571062958]
We investigate the sample complexity of learning the optimal arm for multi-task bandit problems.
Arms consist of two components: one that is shared across tasks (that we call representation) and one that is task-specific (that we call predictor)
We devise an algorithm OSRL-SC whose sample complexity approaches the lower bound, and scales at most as $H(Glog(delta_G)+ Xlog(delta_H))$, with $X,G,H$ being, respectively, the number of tasks, representations and predictors.
arXiv Detail & Related papers (2022-11-28T08:40:12Z) - Multi-armed Bandit Learning on a Graph [0.0]
We study an extension of MAB called the graph bandit, where an agent travels over a graph to maximize the reward collected from different nodes.
We design a learning algorithm, G-UCB, that balances long-term exploration-exploitation using the principle of optimism.
Our proposed algorithm achieves $O(sqrt|S|Tlog(T)+D|S|log T)$ learning regret, where $|S|$ is the number of nodes and $D$ is the diameter of the graph.
arXiv Detail & Related papers (2022-09-20T02:31:42Z) - Collaborative Multi-agent Stochastic Linear Bandits [28.268809091816287]
We study a collaborative multi-agent linear bandit setting, where $N$ agents that form a network communicate locally to minimize their overall regret.
All the agents observe the corresponding rewards of the played actions and use an accelerated consensus procedure to compute an estimate of the average of the rewards obtained by all the agents.
arXiv Detail & Related papers (2022-05-12T19:46:35Z) - Adversarial Linear Contextual Bandits with Graph-Structured Side
Observations [80.95090605985042]
A learning agent repeatedly chooses from a set of $K$ actions after being presented with a $d$-dimensional context vector.
The agent incurs and observes the loss of the chosen action, but also observes the losses of its neighboring actions in the observation structures.
Two efficient algorithms are developed based on textttEXP3.
arXiv Detail & Related papers (2020-12-10T15:40:07Z) - Distributed Bandits: Probabilistic Communication on $d$-regular Graphs [5.33024001730262]
We study the decentralized multi-agent multi-armed bandit problem for agents that communicate with probability over a network defined by a $d$-regular graph.
We propose a new Upper Confidence Bound (UCB) based algorithm and analyze how agent-based strategies contribute to minimizing group regret.
arXiv Detail & Related papers (2020-11-16T04:53:54Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.