Collaborative Multi-agent Stochastic Linear Bandits
- URL: http://arxiv.org/abs/2205.06331v1
- Date: Thu, 12 May 2022 19:46:35 GMT
- Title: Collaborative Multi-agent Stochastic Linear Bandits
- Authors: Ahmadreza Moradipari, Mohammad Ghavamzadeh, and Mahnoosh Alizadeh
- Abstract summary: We study a collaborative multi-agent linear bandit setting, where $N$ agents that form a network communicate locally to minimize their overall regret.
All the agents observe the corresponding rewards of the played actions and use an accelerated consensus procedure to compute an estimate of the average of the rewards obtained by all the agents.
- Score: 28.268809091816287
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study a collaborative multi-agent stochastic linear bandit setting, where
$N$ agents that form a network communicate locally to minimize their overall
regret. In this setting, each agent has its own linear bandit problem (its own
reward parameter) and the goal is to select the best global action w.r.t. the
average of their reward parameters. At each round, each agent proposes an
action, and one action is randomly selected and played as the network action.
All the agents observe the corresponding rewards of the played actions and use
an accelerated consensus procedure to compute an estimate of the average of the
rewards obtained by all the agents. We propose a distributed upper confidence
bound (UCB) algorithm and prove a high probability bound on its $T$-round
regret in which we include a linear growth of regret associated with each
communication round. Our regret bound is of order
$\mathcal{O}\Big(\sqrt{\frac{T}{N \log(1/|\lambda_2|)}}\cdot (\log T)^2\Big)$,
where $\lambda_2$ is the second largest (in absolute value) eigenvalue of the
communication matrix.
Related papers
- Cooperative Multi-Agent Constrained Stochastic Linear Bandits [2.099922236065961]
A network of $N$ agents communicate locally to minimize their collective regret while keeping their expected cost under a specified threshold $tau$.
We propose a safe distributed upper confidence bound algorithm, so called textitMA-OPLB, and establish a high probability bound on its $T$-round regret.
We show that our regret bound is of order $ mathcalOleft(fracdtau-c_0fraclog(NT)2sqrtNsqrtTlog (1/|lambda|)
arXiv Detail & Related papers (2024-10-22T19:34:53Z) - Federated Combinatorial Multi-Agent Multi-Armed Bandits [79.1700188160944]
This paper introduces a federated learning framework tailored for online optimization with bandit.
In this setting, agents subsets of arms, observe noisy rewards for these subsets without accessing individual arm information, and can cooperate and share information at specific intervals.
arXiv Detail & Related papers (2024-05-09T17:40:09Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - Simple Opinion Dynamics for No-Regret Learning [38.61048016579232]
We study a cooperative multi-agent bandit setting in the distributed GOSSIP model.
We introduce and analyze families of memoryless and time-independent protocols for this setting.
For stationary reward settings, we prove for the first time that these simple protocols exhibit best-of-both-worlds behavior.
arXiv Detail & Related papers (2023-06-14T17:59:15Z) - Cooperative Thresholded Lasso for Sparse Linear Bandit [6.52540785559241]
We present a novel approach to address the multi-agent sparse contextual linear bandit problem.
It is first algorithm that tackles row-wise distributed data in sparse linear bandits.
It is widely applicable to high-dimensional multi-agent problems where efficient feature extraction is critical for minimizing regret.
arXiv Detail & Related papers (2023-05-30T16:05:44Z) - A Simple and Provably Efficient Algorithm for Asynchronous Federated
Contextual Linear Bandits [77.09836892653176]
We study federated contextual linear bandits, where $M$ agents cooperate with each other to solve a global contextual linear bandit problem with the help of a central server.
We consider the asynchronous setting, where all agents work independently and the communication between one agent and the server will not trigger other agents' communication.
We prove that the regret of textttFedLinUCB is bounded by $tildeO(dsqrtsum_m=1M T_m)$ and the communication complexity is $tildeO(dM
arXiv Detail & Related papers (2022-07-07T06:16:19Z) - Nearly Minimax Algorithms for Linear Bandits with Shared Representation [86.79657561369397]
We consider the setting where we play $M$ linear bandits with dimension $d$, each for $T$ rounds, and these $M$ bandit tasks share a common $k(ll d)$ dimensional linear representation.
We come up with novel algorithms that achieve $widetildeOleft(dsqrtkMT + kMsqrtTright)$ regret bounds, which matches the known minimax regret lower bound up to logarithmic factors.
arXiv Detail & Related papers (2022-03-29T15:27:13Z) - Cooperative Online Learning in Stochastic and Adversarial MDPs [50.62439652257712]
We study cooperative online learning in and adversarial Markov decision process (MDP)
In each episode, $m$ agents interact with an MDP simultaneously and share information in order to minimize their individual regret.
We are the first to consider cooperative reinforcement learning (RL) with either non-fresh randomness or in adversarial MDPs.
arXiv Detail & Related papers (2022-01-31T12:32:11Z) - Top $K$ Ranking for Multi-Armed Bandit with Noisy Evaluations [102.32996053572144]
We consider a multi-armed bandit setting where, at the beginning of each round, the learner receives noisy independent evaluations of the true reward of each arm.
We derive different algorithmic approaches and theoretical guarantees depending on how the evaluations are generated.
arXiv Detail & Related papers (2021-12-13T09:48:54Z) - Decentralized Multi-Agent Linear Bandits with Safety Constraints [31.67685495996986]
We study decentralized linear bandits, where a network of $N$ agents acts cooperatively to solve a linear bandit-optimization problem.
We propose DLUCB: a fully decentralized algorithm that minimizes the cumulative regret over the entire network.
We show that our ideas extend naturally to the emerging, albeit more challenging, setting of safe bandits.
arXiv Detail & Related papers (2020-12-01T07:33:00Z) - Distributed Bandits: Probabilistic Communication on $d$-regular Graphs [5.33024001730262]
We study the decentralized multi-agent multi-armed bandit problem for agents that communicate with probability over a network defined by a $d$-regular graph.
We propose a new Upper Confidence Bound (UCB) based algorithm and analyze how agent-based strategies contribute to minimizing group regret.
arXiv Detail & Related papers (2020-11-16T04:53:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.