Individual Regret in Cooperative Stochastic Multi-Armed Bandits
- URL: http://arxiv.org/abs/2411.06501v1
- Date: Sun, 10 Nov 2024 15:54:23 GMT
- Title: Individual Regret in Cooperative Stochastic Multi-Armed Bandits
- Authors: Idan Barnea, Tal Lancewicki, Yishay Mansour,
- Abstract summary: We show a near-optimal individual regret bound of $tildeO(sqrtAT/m+A)$, where $A$ is the number of actions, $T$ the time horizon, and $m$ the number of agents.
In particular, assuming a sufficient number of agents, we achieve a regret bound of $tildeO(A)$, which is independent of the sub-optimality gaps and the diameter of the communication graph.
- Score: 44.34612921224053
- License:
- Abstract: We study the regret in stochastic Multi-Armed Bandits (MAB) with multiple agents that communicate over an arbitrary connected communication graph. We show a near-optimal individual regret bound of $\tilde{O}(\sqrt{AT/m}+A)$, where $A$ is the number of actions, $T$ the time horizon, and $m$ the number of agents. In particular, assuming a sufficient number of agents, we achieve a regret bound of $\tilde{O}(A)$, which is independent of the sub-optimality gaps and the diameter of the communication graph. To the best of our knowledge, our study is the first to show an individual regret bound in cooperative stochastic MAB that is independent of the graph's diameter and applicable to non-fully-connected communication graphs.
Related papers
- Federated Combinatorial Multi-Agent Multi-Armed Bandits [79.1700188160944]
This paper introduces a federated learning framework tailored for online optimization with bandit.
In this setting, agents subsets of arms, observe noisy rewards for these subsets without accessing individual arm information, and can cooperate and share information at specific intervals.
arXiv Detail & Related papers (2024-05-09T17:40:09Z) - Cooperative Multi-Agent Graph Bandits: UCB Algorithm and Regret Analysis [5.02063914741425]
We formulate the multi-agent graph bandit problem as a multi-agent extension of the graph bandit problem introduced by Zhang, Johansson, and Li.
We propose an Upper Confidence Bound (UCB)-based learning algorithm, Multi-G-UCB, and prove that its expected regret over $T$ steps is bounded by $O(gamma Nlog(T)[sqrtKT + DK])$.
arXiv Detail & Related papers (2024-01-18T21:36:17Z) - Cooperative Thresholded Lasso for Sparse Linear Bandit [6.52540785559241]
We present a novel approach to address the multi-agent sparse contextual linear bandit problem.
It is first algorithm that tackles row-wise distributed data in sparse linear bandits.
It is widely applicable to high-dimensional multi-agent problems where efficient feature extraction is critical for minimizing regret.
arXiv Detail & Related papers (2023-05-30T16:05:44Z) - Cooperative Multi-Agent Reinforcement Learning: Asynchronous
Communication and Linear Function Approximation [77.09836892653176]
We study multi-agent reinforcement learning in the setting of episodic Markov decision processes.
We propose a provably efficient algorithm based on value that enable asynchronous communication.
We show that a minimal $Omega(dM)$ communication complexity is required to improve the performance through collaboration.
arXiv Detail & Related papers (2023-05-10T20:29:29Z) - A Simple and Provably Efficient Algorithm for Asynchronous Federated
Contextual Linear Bandits [77.09836892653176]
We study federated contextual linear bandits, where $M$ agents cooperate with each other to solve a global contextual linear bandit problem with the help of a central server.
We consider the asynchronous setting, where all agents work independently and the communication between one agent and the server will not trigger other agents' communication.
We prove that the regret of textttFedLinUCB is bounded by $tildeO(dsqrtsum_m=1M T_m)$ and the communication complexity is $tildeO(dM
arXiv Detail & Related papers (2022-07-07T06:16:19Z) - Multi-Agent Neural Rewriter for Vehicle Routing with Limited Disclosure
of Costs [65.23158435596518]
Solving the multi-vehicle routing problem as a team Markov game with partially observable costs.
Our multi-agent reinforcement learning approach, the so-called multi-agent Neural Rewriter, builds on the single-agent Neural Rewriter to solve the problem by iteratively rewriting solutions.
arXiv Detail & Related papers (2022-06-13T09:17:40Z) - Distributed Contextual Linear Bandits with Minimax Optimal Communication
Cost [48.288452411283444]
We study distributed contextual linear bandits with contexts, where $N$ agents act cooperatively to solve a linear bandit-optimization problem with $d$-dimensional features.
We propose a distributed batch elimination version of the LinUCB algorithm, DisBE-LUCB, where the agents share information among each other through a central server.
We prove that over $T$ rounds ($NT$ actions in total) the communication cost of DisBE-LUCB is only $tildemathcalO(dN)$ and its regret is at most $tildemathcalO
arXiv Detail & Related papers (2022-05-26T05:56:23Z) - Collaborative Multi-agent Stochastic Linear Bandits [28.268809091816287]
We study a collaborative multi-agent linear bandit setting, where $N$ agents that form a network communicate locally to minimize their overall regret.
All the agents observe the corresponding rewards of the played actions and use an accelerated consensus procedure to compute an estimate of the average of the rewards obtained by all the agents.
arXiv Detail & Related papers (2022-05-12T19:46:35Z) - Distributed Bandits: Probabilistic Communication on $d$-regular Graphs [5.33024001730262]
We study the decentralized multi-agent multi-armed bandit problem for agents that communicate with probability over a network defined by a $d$-regular graph.
We propose a new Upper Confidence Bound (UCB) based algorithm and analyze how agent-based strategies contribute to minimizing group regret.
arXiv Detail & Related papers (2020-11-16T04:53:54Z) - The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits [20.259428328004738]
We consider a decentralized multi-agent Multi Armed Bandit (MAB) setup consisting of $N$ agents.
In our model, agents collaborate by exchanging messages through pairwise gossip style communications on an arbitrary connected graph.
arXiv Detail & Related papers (2020-01-15T17:49:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.