Decentralized Multi-Agent Linear Bandits with Safety Constraints
- URL: http://arxiv.org/abs/2012.00314v1
- Date: Tue, 1 Dec 2020 07:33:00 GMT
- Title: Decentralized Multi-Agent Linear Bandits with Safety Constraints
- Authors: Sanae Amani, Christos Thrampoulidis
- Abstract summary: We study decentralized linear bandits, where a network of $N$ agents acts cooperatively to solve a linear bandit-optimization problem.
We propose DLUCB: a fully decentralized algorithm that minimizes the cumulative regret over the entire network.
We show that our ideas extend naturally to the emerging, albeit more challenging, setting of safe bandits.
- Score: 31.67685495996986
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study decentralized stochastic linear bandits, where a network of $N$
agents acts cooperatively to efficiently solve a linear bandit-optimization
problem over a $d$-dimensional space. For this problem, we propose DLUCB: a
fully decentralized algorithm that minimizes the cumulative regret over the
entire network. At each round of the algorithm each agent chooses its actions
following an upper confidence bound (UCB) strategy and agents share information
with their immediate neighbors through a carefully designed consensus procedure
that repeats over cycles. Our analysis adjusts the duration of these
communication cycles ensuring near-optimal regret performance
$\mathcal{O}(d\log{NT}\sqrt{NT})$ at a communication rate of
$\mathcal{O}(dN^2)$ per round. The structure of the network affects the regret
performance via a small additive term - coined the regret of delay - that
depends on the spectral gap of the underlying graph. Notably, our results apply
to arbitrary network topologies without a requirement for a dedicated agent
acting as a server. In consideration of situations with high communication
cost, we propose RC-DLUCB: a modification of DLUCB with rare communication
among agents. The new algorithm trades off regret performance for a
significantly reduced total communication cost of $\mathcal{O}(d^3N^{2.5})$
over all $T$ rounds. Finally, we show that our ideas extend naturally to the
emerging, albeit more challenging, setting of safe bandits. For the recently
studied problem of linear bandits with unknown linear safety constraints, we
propose the first safe decentralized algorithm. Our study contributes towards
applying bandit techniques in safety-critical distributed systems that
repeatedly deal with unknown stochastic environments. We present numerical
simulations for various network topologies that corroborate our theoretical
findings.
Related papers
- Cooperative Multi-Agent Constrained Stochastic Linear Bandits [2.099922236065961]
A network of $N$ agents communicate locally to minimize their collective regret while keeping their expected cost under a specified threshold $tau$.
We propose a safe distributed upper confidence bound algorithm, so called textitMA-OPLB, and establish a high probability bound on its $T$-round regret.
We show that our regret bound is of order $ mathcalOleft(fracdtau-c_0fraclog(NT)2sqrtNsqrtTlog (1/|lambda|)
arXiv Detail & Related papers (2024-10-22T19:34:53Z) - Cooperative Thresholded Lasso for Sparse Linear Bandit [6.52540785559241]
We present a novel approach to address the multi-agent sparse contextual linear bandit problem.
It is first algorithm that tackles row-wise distributed data in sparse linear bandits.
It is widely applicable to high-dimensional multi-agent problems where efficient feature extraction is critical for minimizing regret.
arXiv Detail & Related papers (2023-05-30T16:05:44Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - A Simple and Provably Efficient Algorithm for Asynchronous Federated
Contextual Linear Bandits [77.09836892653176]
We study federated contextual linear bandits, where $M$ agents cooperate with each other to solve a global contextual linear bandit problem with the help of a central server.
We consider the asynchronous setting, where all agents work independently and the communication between one agent and the server will not trigger other agents' communication.
We prove that the regret of textttFedLinUCB is bounded by $tildeO(dsqrtsum_m=1M T_m)$ and the communication complexity is $tildeO(dM
arXiv Detail & Related papers (2022-07-07T06:16:19Z) - Distributed Contextual Linear Bandits with Minimax Optimal Communication
Cost [48.288452411283444]
We study distributed contextual linear bandits with contexts, where $N$ agents act cooperatively to solve a linear bandit-optimization problem with $d$-dimensional features.
We propose a distributed batch elimination version of the LinUCB algorithm, DisBE-LUCB, where the agents share information among each other through a central server.
We prove that over $T$ rounds ($NT$ actions in total) the communication cost of DisBE-LUCB is only $tildemathcalO(dN)$ and its regret is at most $tildemathcalO
arXiv Detail & Related papers (2022-05-26T05:56:23Z) - Collaborative Multi-agent Stochastic Linear Bandits [28.268809091816287]
We study a collaborative multi-agent linear bandit setting, where $N$ agents that form a network communicate locally to minimize their overall regret.
All the agents observe the corresponding rewards of the played actions and use an accelerated consensus procedure to compute an estimate of the average of the rewards obtained by all the agents.
arXiv Detail & Related papers (2022-05-12T19:46:35Z) - A Robust Phased Elimination Algorithm for Corruption-Tolerant Gaussian
Process Bandits [118.22458816174144]
We propose a novel robust elimination-type algorithm that runs in epochs, combines exploration with infrequent switching to select a small subset of actions, and plays each action for multiple time instants.
Our algorithm, GP Robust Phased Elimination (RGP-PE), successfully balances robustness to corruptions with exploration and exploitation.
We perform the first empirical study of robustness in the corrupted GP bandit setting, and show that our algorithm is robust against a variety of adversarial attacks.
arXiv Detail & Related papers (2022-02-03T21:19:36Z) - Restless-UCB, an Efficient and Low-complexity Algorithm for Online
Restless Bandits [61.490254407420906]
We study the online restless bandit problem, where the state of each arm evolves according to a Markov chain.
We propose Restless-UCB, a learning policy that follows the explore-then-commit framework.
arXiv Detail & Related papers (2020-11-05T05:16:04Z) - Bayesian Algorithms for Decentralized Stochastic Bandits [12.350564981588063]
We study a decentralized cooperative multi-agent multi-armed bandit problem with $K$ arms and $N$ agents connected over a network.
In our model, each arm's reward distribution is same for all agents, and rewards are drawn independently across agents and over time steps.
The goal is to minimize cumulative regret averaged over the entire network.
arXiv Detail & Related papers (2020-10-20T19:14:20Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.