A Simple and Provably Efficient Algorithm for Asynchronous Federated
Contextual Linear Bandits
- URL: http://arxiv.org/abs/2207.03106v1
- Date: Thu, 7 Jul 2022 06:16:19 GMT
- Title: A Simple and Provably Efficient Algorithm for Asynchronous Federated
Contextual Linear Bandits
- Authors: Jiafan He and Tianhao Wang and Yifei Min and Quanquan Gu
- Abstract summary: We study federated contextual linear bandits, where $M$ agents cooperate with each other to solve a global contextual linear bandit problem with the help of a central server.
We consider the asynchronous setting, where all agents work independently and the communication between one agent and the server will not trigger other agents' communication.
We prove that the regret of textttFedLinUCB is bounded by $tildeO(dsqrtsum_m=1M T_m)$ and the communication complexity is $tildeO(dM
- Score: 77.09836892653176
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study federated contextual linear bandits, where $M$ agents cooperate with
each other to solve a global contextual linear bandit problem with the help of
a central server. We consider the asynchronous setting, where all agents work
independently and the communication between one agent and the server will not
trigger other agents' communication. We propose a simple algorithm named
\texttt{FedLinUCB} based on the principle of optimism. We prove that the regret
of \texttt{FedLinUCB} is bounded by $\tilde{O}(d\sqrt{\sum_{m=1}^M T_m})$ and
the communication complexity is $\tilde{O}(dM^2)$, where $d$ is the dimension
of the contextual vector and $T_m$ is the total number of interactions with the
environment by $m$-th agent. To the best of our knowledge, this is the first
provably efficient algorithm that allows fully asynchronous communication for
federated contextual linear bandits, while achieving the same regret guarantee
as in the single-agent setting.
Related papers
- Federated Combinatorial Multi-Agent Multi-Armed Bandits [79.1700188160944]
This paper introduces a federated learning framework tailored for online optimization with bandit.
In this setting, agents subsets of arms, observe noisy rewards for these subsets without accessing individual arm information, and can cooperate and share information at specific intervals.
arXiv Detail & Related papers (2024-05-09T17:40:09Z) - Federated Contextual Cascading Bandits with Asynchronous Communication
and Heterogeneous Users [95.77678166036561]
We propose a UCB-type algorithm with delicate communication protocols.
We give sub-linear regret bounds on par with those achieved in the synchronous framework.
Empirical evaluation on synthetic and real-world datasets validates our algorithm's superior performance in terms of regrets and communication costs.
arXiv Detail & Related papers (2024-02-26T05:31:14Z) - Cooperative Multi-Agent Reinforcement Learning: Asynchronous
Communication and Linear Function Approximation [77.09836892653176]
We study multi-agent reinforcement learning in the setting of episodic Markov decision processes.
We propose a provably efficient algorithm based on value that enable asynchronous communication.
We show that a minimal $Omega(dM)$ communication complexity is required to improve the performance through collaboration.
arXiv Detail & Related papers (2023-05-10T20:29:29Z) - Distributed Contextual Linear Bandits with Minimax Optimal Communication
Cost [48.288452411283444]
We study distributed contextual linear bandits with contexts, where $N$ agents act cooperatively to solve a linear bandit-optimization problem with $d$-dimensional features.
We propose a distributed batch elimination version of the LinUCB algorithm, DisBE-LUCB, where the agents share information among each other through a central server.
We prove that over $T$ rounds ($NT$ actions in total) the communication cost of DisBE-LUCB is only $tildemathcalO(dN)$ and its regret is at most $tildemathcalO
arXiv Detail & Related papers (2022-05-26T05:56:23Z) - Distributed Bandits with Heterogeneous Agents [38.90376765616447]
This paper tackles a multi-agent bandit setting where $M$ agents cooperate together to solve the same instance of a $K$-armed bandit problem.
We propose two learning algorithms, ucbo and AAE.
We prove that both algorithms achieve order-optimal regret, which is $Oleft(sum_i:tildeDelta_i>0 log T/tildeDelta_iright)$, where $tildeDelta_i$ is the minimum suboptimality gap between the reward mean of
arXiv Detail & Related papers (2022-01-23T20:04:15Z) - Distributed Saddle-Point Problems Under Similarity [173.19083235638104]
We show that a given suboptimality $epsilon0$ is achieved master/workers networks in $Omegabig.
We then propose algorithms matching the lower bounds either types of networks (up to log-overs)
We assess effectiveness of the proposed algorithms on a robust logistic regression problem.
arXiv Detail & Related papers (2021-07-22T14:25:16Z) - Communication Efficient Parallel Reinforcement Learning [34.77250498401055]
We consider the problem where $M$ agents interact with $M$ identical and independent environments with $S$ states and $A$ actions.
We aim to find an algorithm that allows the agents to minimize the regret with infrequent communication rounds.
arXiv Detail & Related papers (2021-02-22T02:46:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.