Distributed Bandits with Heterogeneous Agents
- URL: http://arxiv.org/abs/2201.09353v1
- Date: Sun, 23 Jan 2022 20:04:15 GMT
- Title: Distributed Bandits with Heterogeneous Agents
- Authors: Lin Yang, Yu-zhen Janice Chen, Mohammad Hajiesmaili, John CS Lui, Don
Towsley
- Abstract summary: This paper tackles a multi-agent bandit setting where $M$ agents cooperate together to solve the same instance of a $K$-armed bandit problem.
We propose two learning algorithms, ucbo and AAE.
We prove that both algorithms achieve order-optimal regret, which is $Oleft(sum_i:tildeDelta_i>0 log T/tildeDelta_iright)$, where $tildeDelta_i$ is the minimum suboptimality gap between the reward mean of
- Score: 38.90376765616447
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper tackles a multi-agent bandit setting where $M$ agents cooperate
together to solve the same instance of a $K$-armed stochastic bandit problem.
The agents are \textit{heterogeneous}: each agent has limited access to a local
subset of arms and the agents are asynchronous with different gaps between
decision-making rounds. The goal for each agent is to find its optimal local
arm, and agents can cooperate by sharing their observations with others. While
cooperation between agents improves the performance of learning, it comes with
an additional complexity of communication between agents. For this
heterogeneous multi-agent setting, we propose two learning algorithms, \ucbo
and \AAE. We prove that both algorithms achieve order-optimal regret, which is
$O\left(\sum_{i:\tilde{\Delta}_i>0} \log T/\tilde{\Delta}_i\right)$, where
$\tilde{\Delta}_i$ is the minimum suboptimality gap between the reward mean of
arm $i$ and any local optimal arm. In addition, a careful selection of the
valuable information for cooperation, \AAE achieves a low communication
complexity of $O(\log T)$. Last, numerical experiments verify the efficiency of
both algorithms.
Related papers
- Cooperative Multi-Agent Constrained Stochastic Linear Bandits [2.099922236065961]
A network of $N$ agents communicate locally to minimize their collective regret while keeping their expected cost under a specified threshold $tau$.
We propose a safe distributed upper confidence bound algorithm, so called textitMA-OPLB, and establish a high probability bound on its $T$-round regret.
We show that our regret bound is of order $ mathcalOleft(fracdtau-c_0fraclog(NT)2sqrtNsqrtTlog (1/|lambda|)
arXiv Detail & Related papers (2024-10-22T19:34:53Z) - Federated Combinatorial Multi-Agent Multi-Armed Bandits [79.1700188160944]
This paper introduces a federated learning framework tailored for online optimization with bandit.
In this setting, agents subsets of arms, observe noisy rewards for these subsets without accessing individual arm information, and can cooperate and share information at specific intervals.
arXiv Detail & Related papers (2024-05-09T17:40:09Z) - Cooperative Multi-Agent Reinforcement Learning: Asynchronous
Communication and Linear Function Approximation [77.09836892653176]
We study multi-agent reinforcement learning in the setting of episodic Markov decision processes.
We propose a provably efficient algorithm based on value that enable asynchronous communication.
We show that a minimal $Omega(dM)$ communication complexity is required to improve the performance through collaboration.
arXiv Detail & Related papers (2023-05-10T20:29:29Z) - On the Sample Complexity of Representation Learning in Multi-task
Bandits with Global and Local structure [77.60508571062958]
We investigate the sample complexity of learning the optimal arm for multi-task bandit problems.
Arms consist of two components: one that is shared across tasks (that we call representation) and one that is task-specific (that we call predictor)
We devise an algorithm OSRL-SC whose sample complexity approaches the lower bound, and scales at most as $H(Glog(delta_G)+ Xlog(delta_H))$, with $X,G,H$ being, respectively, the number of tasks, representations and predictors.
arXiv Detail & Related papers (2022-11-28T08:40:12Z) - A Simple and Provably Efficient Algorithm for Asynchronous Federated
Contextual Linear Bandits [77.09836892653176]
We study federated contextual linear bandits, where $M$ agents cooperate with each other to solve a global contextual linear bandit problem with the help of a central server.
We consider the asynchronous setting, where all agents work independently and the communication between one agent and the server will not trigger other agents' communication.
We prove that the regret of textttFedLinUCB is bounded by $tildeO(dsqrtsum_m=1M T_m)$ and the communication complexity is $tildeO(dM
arXiv Detail & Related papers (2022-07-07T06:16:19Z) - Distributed Contextual Linear Bandits with Minimax Optimal Communication
Cost [48.288452411283444]
We study distributed contextual linear bandits with contexts, where $N$ agents act cooperatively to solve a linear bandit-optimization problem with $d$-dimensional features.
We propose a distributed batch elimination version of the LinUCB algorithm, DisBE-LUCB, where the agents share information among each other through a central server.
We prove that over $T$ rounds ($NT$ actions in total) the communication cost of DisBE-LUCB is only $tildemathcalO(dN)$ and its regret is at most $tildemathcalO
arXiv Detail & Related papers (2022-05-26T05:56:23Z) - Communication Efficient Parallel Reinforcement Learning [34.77250498401055]
We consider the problem where $M$ agents interact with $M$ identical and independent environments with $S$ states and $A$ actions.
We aim to find an algorithm that allows the agents to minimize the regret with infrequent communication rounds.
arXiv Detail & Related papers (2021-02-22T02:46:36Z) - Robust Multi-Agent Multi-Armed Bandits [26.26185074977412]
Recent works have shown that agents facing independent instances of a $K$-armed bandit can collaborate to decrease regret.
We show that collaboration indeed decreases regret for this algorithm, assuming $m$ is small compared to $K$ but without assumptions on malicious agents' behavior.
arXiv Detail & Related papers (2020-07-07T22:27:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.