Federated Bandit: A Gossiping Approach
- URL: http://arxiv.org/abs/2010.12763v2
- Date: Wed, 7 Apr 2021 04:59:14 GMT
- Title: Federated Bandit: A Gossiping Approach
- Authors: Zhaowei Zhu, Jingxuan Zhu, Ji Liu, Yang Liu
- Abstract summary: We study emphFederated Bandit, a decentralized Multi-Armed Bandit problem with a set of $N$ agents, who can only communicate their local data with neighbors described by a connected graph $G$.
We show that Gossip_UCB successfully adapts local bandit learning into a global gossiping process for sharing information among connected agents, and achieves guaranteed regret at the order of $O(max textttpoly(N,M) log T, textttpoly(N,M) log_lambda-1
- Score: 22.595542913855624
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study \emph{Federated Bandit}, a decentralized Multi-Armed
Bandit problem with a set of $N$ agents, who can only communicate their local
data with neighbors described by a connected graph $G$. Each agent makes a
sequence of decisions on selecting an arm from $M$ candidates, yet they only
have access to local and potentially biased feedback/evaluation of the true
reward for each action taken. Learning only locally will lead agents to
sub-optimal actions while converging to a no-regret strategy requires a
collection of distributed data. Motivated by the proposal of federated
learning, we aim for a solution with which agents will never share their local
observations with a central entity, and will be allowed to only share a private
copy of his/her own information with their neighbors. We first propose a
decentralized bandit algorithm Gossip_UCB, which is a coupling of variants of
both the classical gossiping algorithm and the celebrated Upper Confidence
Bound (UCB) bandit algorithm. We show that Gossip_UCB successfully adapts local
bandit learning into a global gossiping process for sharing information among
connected agents, and achieves guaranteed regret at the order of $O(\max\{
\texttt{poly}(N,M) \log T, \texttt{poly}(N,M)\log_{\lambda_2^{-1}} N\})$ for
all $N$ agents, where $\lambda_2\in(0,1)$ is the second largest eigenvalue of
the expected gossip matrix, which is a function of $G$. We then propose
Fed_UCB, a differentially private version of Gossip_UCB, in which the agents
preserve $\epsilon$-differential privacy of their local data while achieving
$O(\max \{\frac{\texttt{poly}(N,M)}{\epsilon}\log^{2.5} T, \texttt{poly}(N,M)
(\log_{\lambda_2^{-1}} N + \log T) \})$ regret.
Related papers
- Cooperative Multi-Agent Constrained Stochastic Linear Bandits [2.099922236065961]
A network of $N$ agents communicate locally to minimize their collective regret while keeping their expected cost under a specified threshold $tau$.
We propose a safe distributed upper confidence bound algorithm, so called textitMA-OPLB, and establish a high probability bound on its $T$-round regret.
We show that our regret bound is of order $ mathcalOleft(fracdtau-c_0fraclog(NT)2sqrtNsqrtTlog (1/|lambda|)
arXiv Detail & Related papers (2024-10-22T19:34:53Z) - Federated Combinatorial Multi-Agent Multi-Armed Bandits [79.1700188160944]
This paper introduces a federated learning framework tailored for online optimization with bandit.
In this setting, agents subsets of arms, observe noisy rewards for these subsets without accessing individual arm information, and can cooperate and share information at specific intervals.
arXiv Detail & Related papers (2024-05-09T17:40:09Z) - Federated Linear Bandits with Finite Adversarial Actions [20.1041278044797]
We study a federated linear bandits model, where $M$ clients communicate with a central server to solve a linear contextual bandits problem.
To address the unique challenges of adversarial finite action sets, we propose the FedSupLinUCB algorithm.
We prove that FedSupLinUCB achieves a total regret of $tildeO(sqrtd T)$, where $T$ is the total number of arm pulls from all clients, and $d$ is the ambient dimension of the linear model.
arXiv Detail & Related papers (2023-11-02T03:41:58Z) - Optimal Algorithms for Latent Bandits with Cluster Structure [50.44722775727619]
We consider the problem of latent bandits with cluster structure where there are multiple users, each with an associated multi-armed bandit problem.
We propose LATTICE which allows exploitation of the latent cluster structure to provide the minimax optimal regret of $widetildeO(sqrt(mathsfM+mathsfN)mathsfT.
arXiv Detail & Related papers (2023-01-17T17:49:04Z) - A Simple and Provably Efficient Algorithm for Asynchronous Federated
Contextual Linear Bandits [77.09836892653176]
We study federated contextual linear bandits, where $M$ agents cooperate with each other to solve a global contextual linear bandit problem with the help of a central server.
We consider the asynchronous setting, where all agents work independently and the communication between one agent and the server will not trigger other agents' communication.
We prove that the regret of textttFedLinUCB is bounded by $tildeO(dsqrtsum_m=1M T_m)$ and the communication complexity is $tildeO(dM
arXiv Detail & Related papers (2022-07-07T06:16:19Z) - Distributed Contextual Linear Bandits with Minimax Optimal Communication
Cost [48.288452411283444]
We study distributed contextual linear bandits with contexts, where $N$ agents act cooperatively to solve a linear bandit-optimization problem with $d$-dimensional features.
We propose a distributed batch elimination version of the LinUCB algorithm, DisBE-LUCB, where the agents share information among each other through a central server.
We prove that over $T$ rounds ($NT$ actions in total) the communication cost of DisBE-LUCB is only $tildemathcalO(dN)$ and its regret is at most $tildemathcalO
arXiv Detail & Related papers (2022-05-26T05:56:23Z) - Corralling a Larger Band of Bandits: A Case Study on Switching Regret
for Linear Bandits [99.86860277006318]
We consider the problem of combining and learning over a set of adversarial algorithms with the goal of adaptively tracking the best one on the fly.
The CORRAL of Agarwal et al. achieves this goal with a regret overhead of order $widetildeO(sqrtd S T)$ where $M$ is the number of base algorithms and $T$ is the time horizon.
Motivated by this issue, we propose a new recipe to corral a larger band of bandit algorithms whose regret overhead has only emphlogarithmic dependence on $M$ as long
arXiv Detail & Related papers (2022-02-12T21:55:44Z) - Distributed Saddle-Point Problems Under Similarity [173.19083235638104]
We show that a given suboptimality $epsilon0$ is achieved master/workers networks in $Omegabig.
We then propose algorithms matching the lower bounds either types of networks (up to log-overs)
We assess effectiveness of the proposed algorithms on a robust logistic regression problem.
arXiv Detail & Related papers (2021-07-22T14:25:16Z) - Multi-Agent Multi-Armed Bandits with Limited Communication [41.63062883750805]
We consider the problem where $N$ agents interact with an instance of a $K$ arm bandit problem for $K gg N$.
The agents aim to simultaneously minimize the cumulative regret over all the agents for a total of $T$ time steps, the number of communication rounds, and the number of bits in each communication round.
We present Limited Communication Collaboration - Upper Bound (LCC-UCB), a doubling-epoch based algorithm where each agent communicates only after the end of the epoch and shares the index of the best arm it knows.
arXiv Detail & Related papers (2021-02-10T06:28:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.