Distributed Contextual Linear Bandits with Minimax Optimal Communication
Cost
- URL: http://arxiv.org/abs/2205.13170v1
- Date: Thu, 26 May 2022 05:56:23 GMT
- Title: Distributed Contextual Linear Bandits with Minimax Optimal Communication
Cost
- Authors: Sanae Amani, Tor Lattimore, Andr\'as Gy\"orgy, Lin F. Yang
- Abstract summary: We study distributed contextual linear bandits with contexts, where $N$ agents act cooperatively to solve a linear bandit-optimization problem with $d$-dimensional features.
We propose a distributed batch elimination version of the LinUCB algorithm, DisBE-LUCB, where the agents share information among each other through a central server.
We prove that over $T$ rounds ($NT$ actions in total) the communication cost of DisBE-LUCB is only $tildemathcalO(dN)$ and its regret is at most $tildemathcalO
- Score: 48.288452411283444
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study distributed contextual linear bandits with stochastic contexts,
where $N$ agents act cooperatively to solve a linear bandit-optimization
problem with $d$-dimensional features. For this problem, we propose a
distributed batch elimination version of the LinUCB algorithm, DisBE-LUCB,
where the agents share information among each other through a central server.
We prove that over $T$ rounds ($NT$ actions in total) the communication cost of
DisBE-LUCB is only $\tilde{\mathcal{O}}(dN)$ and its regret is at most
$\tilde{\mathcal{O}}(\sqrt{dNT})$, which is of the same order as that incurred
by an optimal single-agent algorithm for $NT$ rounds. Remarkably, we derive an
information-theoretic lower bound on the communication cost of the distributed
contextual linear bandit problem with stochastic contexts, and prove that our
proposed algorithm is nearly minimax optimal in terms of \emph{both regret and
communication cost}. Finally, we propose DecBE-LUCB, a fully decentralized
version of DisBE-LUCB, which operates without a central server, where agents
share information with their \emph{immediate neighbors} through a carefully
designed consensus procedure.
Related papers
- Cooperative Multi-Agent Constrained Stochastic Linear Bandits [2.099922236065961]
A network of $N$ agents communicate locally to minimize their collective regret while keeping their expected cost under a specified threshold $tau$.
We propose a safe distributed upper confidence bound algorithm, so called textitMA-OPLB, and establish a high probability bound on its $T$-round regret.
We show that our regret bound is of order $ mathcalOleft(fracdtau-c_0fraclog(NT)2sqrtNsqrtTlog (1/|lambda|)
arXiv Detail & Related papers (2024-10-22T19:34:53Z) - Federated Combinatorial Multi-Agent Multi-Armed Bandits [79.1700188160944]
This paper introduces a federated learning framework tailored for online optimization with bandit.
In this setting, agents subsets of arms, observe noisy rewards for these subsets without accessing individual arm information, and can cooperate and share information at specific intervals.
arXiv Detail & Related papers (2024-05-09T17:40:09Z) - Cooperative Multi-Agent Reinforcement Learning: Asynchronous
Communication and Linear Function Approximation [77.09836892653176]
We study multi-agent reinforcement learning in the setting of episodic Markov decision processes.
We propose a provably efficient algorithm based on value that enable asynchronous communication.
We show that a minimal $Omega(dM)$ communication complexity is required to improve the performance through collaboration.
arXiv Detail & Related papers (2023-05-10T20:29:29Z) - A Simple and Provably Efficient Algorithm for Asynchronous Federated
Contextual Linear Bandits [77.09836892653176]
We study federated contextual linear bandits, where $M$ agents cooperate with each other to solve a global contextual linear bandit problem with the help of a central server.
We consider the asynchronous setting, where all agents work independently and the communication between one agent and the server will not trigger other agents' communication.
We prove that the regret of textttFedLinUCB is bounded by $tildeO(dsqrtsum_m=1M T_m)$ and the communication complexity is $tildeO(dM
arXiv Detail & Related papers (2022-07-07T06:16:19Z) - Settling the Communication Complexity for Distributed Offline
Reinforcement Learning [10.315054389907031]
We study a novel setting in offline reinforcement learning (RL) where a number of distributed machines jointly cooperate to solve the problem.
There is a budget constraint on the total number of information (in terms of bits) that each machine can send out.
For value function prediction in contextual bandits, and both episodic and non-episodic MDPs, we establish information-theoretic lower bounds on the minimax risk.
arXiv Detail & Related papers (2022-02-10T06:27:07Z) - Acceleration in Distributed Optimization Under Similarity [72.54787082152278]
We study distributed (strongly convex) optimization problems over a network of agents, with no centralized nodes.
An $varepsilon$-solution is achieved in $tildemathcalrhoObig(sqrtfracbeta/mu (1-)log1/varepsilonbig)$ number of communications steps.
This rate matches (up to poly-log factors) for the first time lower complexity communication bounds of distributed gossip-algorithms applied to the class of problems of interest.
arXiv Detail & Related papers (2021-10-24T04:03:00Z) - Decentralized Multi-Agent Linear Bandits with Safety Constraints [31.67685495996986]
We study decentralized linear bandits, where a network of $N$ agents acts cooperatively to solve a linear bandit-optimization problem.
We propose DLUCB: a fully decentralized algorithm that minimizes the cumulative regret over the entire network.
We show that our ideas extend naturally to the emerging, albeit more challenging, setting of safe bandits.
arXiv Detail & Related papers (2020-12-01T07:33:00Z) - Federated Bandit: A Gossiping Approach [22.595542913855624]
We study emphFederated Bandit, a decentralized Multi-Armed Bandit problem with a set of $N$ agents, who can only communicate their local data with neighbors described by a connected graph $G$.
We show that Gossip_UCB successfully adapts local bandit learning into a global gossiping process for sharing information among connected agents, and achieves guaranteed regret at the order of $O(max textttpoly(N,M) log T, textttpoly(N,M) log_lambda-1
arXiv Detail & Related papers (2020-10-24T03:44:25Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.