Robust Multi-Agent Bandits Over Undirected Graphs
- URL: http://arxiv.org/abs/2203.00076v1
- Date: Mon, 28 Feb 2022 20:21:55 GMT
- Title: Robust Multi-Agent Bandits Over Undirected Graphs
- Authors: Daniel Vial, Sanjay Shakkottai, R. Srikant
- Abstract summary: We consider a multi-agent multi-armed bandit setting in which honest agents collaborate over a network to minimize regret.
We show the situation is murkier beyond the case of a complete graph.
We propose a new algorithm for which the $i$-th agent has regret $O( d_textmal(i) + K/n) log(T)/Delta)$ on any connected and undirected graph.
- Score: 26.26185074977412
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a multi-agent multi-armed bandit setting in which $n$ honest
agents collaborate over a network to minimize regret but $m$ malicious agents
can disrupt learning arbitrarily. Assuming the network is the complete graph,
existing algorithms incur $O( (m + K/n) \log (T) / \Delta )$ regret in this
setting, where $K$ is the number of arms and $\Delta$ is the arm gap. For $m
\ll K$, this improves over the single-agent baseline regret of
$O(K\log(T)/\Delta)$.
In this work, we show the situation is murkier beyond the case of a complete
graph. In particular, we prove that if the state-of-the-art algorithm is used
on the undirected line graph, honest agents can suffer (nearly) linear regret
until time is doubly exponential in $K$ and $n$. In light of this negative
result, we propose a new algorithm for which the $i$-th agent has regret $O( (
d_{\text{mal}}(i) + K/n) \log(T)/\Delta)$ on any connected and undirected
graph, where $d_{\text{mal}}(i)$ is the number of $i$'s neighbors who are
malicious. Thus, we generalize existing regret bounds beyond the complete graph
(where $d_{\text{mal}}(i) = m$), and show the effect of malicious agents is
entirely local (in the sense that only the $d_{\text{mal}}(i)$ malicious agents
directly connected to $i$ affect its long-term regret).
Related papers
- On Interpolating Experts and Multi-Armed Bandits [1.9497955504539408]
We prove tight minimax regret bounds for $mathbfm$-MAB and design an optimal PAC algorithm for its pure exploration version, $mathbfm$-BAI.
As consequences, we obtained tight minimax regret bounds for several families of feedback graphs.
arXiv Detail & Related papers (2023-07-14T10:38:30Z) - Fair Multi-Agent Bandits [14.614647884175657]
We provide an algorithm with regret $Oleft(N3 log fracBDelta f(log T) log T right)$, where $f(t)$ is any function diverging to infinity with $t$.
This significantly improves previous results which had the same upper bound on the regret of order $O(f(log T) log T )$ but an exponential dependence on the number of agents.
arXiv Detail & Related papers (2023-06-07T15:05:53Z) - Online Learning with Feedback Graphs: The True Shape of Regret [82.00098840619847]
We prove that the minimax regret is proportional to $R*$ for any graph and time horizon $T$.
Introducing an intricate exploration strategy, we define the mainAlgorithm algorithm that achieves the minimax optimal regret bound.
arXiv Detail & Related papers (2023-06-05T15:35:00Z) - UniRank: Unimodal Bandit Algorithm for Online Ranking [0.0]
We create an algorithm with an expected regret matching $O(fracLlog(L)Deltalog(T))$ with $2L$ players, $T$ and a minimum reward gap $Delta$.
Some experimental results finally show these theoretical results are corroborated in practice.
arXiv Detail & Related papers (2022-08-02T15:01:33Z) - Unimodal Mono-Partite Matching in a Bandit Setting [0.0]
We create an algorithm with an expected regret matching $O(fracLlog(L)Deltalog(T))$ with $2L$ players, $T$ and a minimum reward gap $Delta$.
Some experimental results finally show these theoretical results are corroborated in practice.
arXiv Detail & Related papers (2022-08-02T14:55:50Z) - Nearly Minimax Algorithms for Linear Bandits with Shared Representation [86.79657561369397]
We consider the setting where we play $M$ linear bandits with dimension $d$, each for $T$ rounds, and these $M$ bandit tasks share a common $k(ll d)$ dimensional linear representation.
We come up with novel algorithms that achieve $widetildeOleft(dsqrtkMT + kMsqrtTright)$ regret bounds, which matches the known minimax regret lower bound up to logarithmic factors.
arXiv Detail & Related papers (2022-03-29T15:27:13Z) - Corralling a Larger Band of Bandits: A Case Study on Switching Regret
for Linear Bandits [99.86860277006318]
We consider the problem of combining and learning over a set of adversarial algorithms with the goal of adaptively tracking the best one on the fly.
The CORRAL of Agarwal et al. achieves this goal with a regret overhead of order $widetildeO(sqrtd S T)$ where $M$ is the number of base algorithms and $T$ is the time horizon.
Motivated by this issue, we propose a new recipe to corral a larger band of bandit algorithms whose regret overhead has only emphlogarithmic dependence on $M$ as long
arXiv Detail & Related papers (2022-02-12T21:55:44Z) - Logarithmic Regret from Sublinear Hints [76.87432703516942]
We show that an algorithm can obtain $O(log T)$ regret with just $O(sqrtT)$ hints under a natural query model.
We also show that $o(sqrtT)$ hints cannot guarantee better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2021-11-09T16:50:18Z) - Differentially Private Multi-Armed Bandits in the Shuffle Model [58.22098764071924]
We give an $(varepsilon,delta)$-differentially private algorithm for the multi-armed bandit (MAB) problem in the shuffle model.
Our upper bound almost matches the regret of the best known algorithms for the centralized model, and significantly outperforms the best known algorithm in the local model.
arXiv Detail & Related papers (2021-06-05T14:11:01Z) - Communication Efficient Parallel Reinforcement Learning [34.77250498401055]
We consider the problem where $M$ agents interact with $M$ identical and independent environments with $S$ states and $A$ actions.
We aim to find an algorithm that allows the agents to minimize the regret with infrequent communication rounds.
arXiv Detail & Related papers (2021-02-22T02:46:36Z) - Impact of Representation Learning in Linear Bandits [83.17684841392754]
We study how representation learning can improve the efficiency of bandit problems.
We present a new algorithm which achieves $widetildeO(TsqrtkN + sqrtdkNT)$ regret, where $N$ is the number of rounds we play for each bandit.
arXiv Detail & Related papers (2020-10-13T16:35:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.