Related papers: Robust Multi-Agent Bandits Over Undirected Graphs

Robust Multi-Agent Bandits Over Undirected Graphs

URL: http://arxiv.org/abs/2203.00076v1
Date: Mon, 28 Feb 2022 20:21:55 GMT
Title: Robust Multi-Agent Bandits Over Undirected Graphs
Authors: Daniel Vial, Sanjay Shakkottai, R. Srikant
Abstract summary: We consider a multi-agent multi-armed bandit setting in which honest agents collaborate over a network to minimize regret. We show the situation is murkier beyond the case of a complete graph. We propose a new algorithm for which the $i$-th agent has regret $O( d_textmal(i) + K/n) log(T)/Delta)$ on any connected and undirected graph.
Score: 26.26185074977412
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider a multi-agent multi-armed bandit setting in which $n$ honest agents collaborate over a network to minimize regret but $m$ malicious agents can disrupt learning arbitrarily. Assuming the network is the complete graph, existing algorithms incur $O( (m + K/n) \log (T) / \Delta )$ regret in this setting, where $K$ is the number of arms and $\Delta$ is the arm gap. For $m \ll K$, this improves over the single-agent baseline regret of $O(K\log(T)/\Delta)$. In this work, we show the situation is murkier beyond the case of a complete graph. In particular, we prove that if the state-of-the-art algorithm is used on the undirected line graph, honest agents can suffer (nearly) linear regret until time is doubly exponential in $K$ and $n$. In light of this negative result, we propose a new algorithm for which the $i$-th agent has regret $O( ( d_{\text{mal}}(i) + K/n) \log(T)/\Delta)$ on any connected and undirected graph, where $d_{\text{mal}}(i)$ is the number of $i$'s neighbors who are malicious. Thus, we generalize existing regret bounds beyond the complete graph (where $d_{\text{mal}}(i) = m$), and show the effect of malicious agents is entirely local (in the sense that only the $d_{\text{mal}}(i)$ malicious agents directly connected to $i$ affect its long-term regret).

Related papers

Online Inverse Linear Optimization: Improved Regret Bound, Robustness to Suboptimality, and Toward Tight Regret Analysis [25.50155563108198]
We study an online learning problem where, over $T$ rounds, a learner observes both time-varying sets of feasible actions and an agent's optimal actions. We obtain an $O(nln T)$ regret bound, improving upon the previous bound of $O(n4ln T)$ by a factor of $n3$.
arXiv Detail & Related papers (2025-01-24T09:19:15Z)
p-Mean Regret for Stochastic Bandits [52.828710025519996]
We introduce a simple, unified UCB-based algorithm that achieves novel $p$-mean regret bounds. Our framework encompasses both average cumulative regret and Nash regret as special cases.
arXiv Detail & Related papers (2024-12-14T08:38:26Z)
On Interpolating Experts and Multi-Armed Bandits [1.9497955504539408]
We prove tight minimax regret bounds for $mathbfm$-MAB and design an optimal PAC algorithm for its pure exploration version, $mathbfm$-BAI. As consequences, we obtained tight minimax regret bounds for several families of feedback graphs.
arXiv Detail & Related papers (2023-07-14T10:38:30Z)
Fair Multi-Agent Bandits [14.614647884175657]
We provide an algorithm with regret $Oleft(N3 log fracBDelta f(log T) log T right)$, where $f(t)$ is any function diverging to infinity with $t$. This significantly improves previous results which had the same upper bound on the regret of order $O(f(log T) log T )$ but an exponential dependence on the number of agents.
arXiv Detail & Related papers (2023-06-07T15:05:53Z)
Online Learning with Feedback Graphs: The True Shape of Regret [82.00098840619847]
We prove that the minimax regret is proportional to $R*$ for any graph and time horizon $T$. Introducing an intricate exploration strategy, we define the mainAlgorithm algorithm that achieves the minimax optimal regret bound.
arXiv Detail & Related papers (2023-06-05T15:35:00Z)
UniRank: Unimodal Bandit Algorithm for Online Ranking [0.0]
We create an algorithm with an expected regret matching $O(fracLlog(L)Deltalog(T))$ with $2L$ players, $T$ and a minimum reward gap $Delta$. Some experimental results finally show these theoretical results are corroborated in practice.
arXiv Detail & Related papers (2022-08-02T15:01:33Z)
Unimodal Mono-Partite Matching in a Bandit Setting [0.0]
We create an algorithm with an expected regret matching $O(fracLlog(L)Deltalog(T))$ with $2L$ players, $T$ and a minimum reward gap $Delta$. Some experimental results finally show these theoretical results are corroborated in practice.
arXiv Detail & Related papers (2022-08-02T14:55:50Z)
Nearly Minimax Algorithms for Linear Bandits with Shared Representation [86.79657561369397]
We consider the setting where we play $M$ linear bandits with dimension $d$, each for $T$ rounds, and these $M$ bandit tasks share a common $k(ll d)$ dimensional linear representation. We come up with novel algorithms that achieve $widetildeOleft(dsqrtkMT + kMsqrtTright)$ regret bounds, which matches the known minimax regret lower bound up to logarithmic factors.
arXiv Detail & Related papers (2022-03-29T15:27:13Z)
Corralling a Larger Band of Bandits: A Case Study on Switching Regret for Linear Bandits [99.86860277006318]
We consider the problem of combining and learning over a set of adversarial algorithms with the goal of adaptively tracking the best one on the fly. The CORRAL of Agarwal et al. achieves this goal with a regret overhead of order $widetildeO(sqrtd S T)$ where $M$ is the number of base algorithms and $T$ is the time horizon. Motivated by this issue, we propose a new recipe to corral a larger band of bandit algorithms whose regret overhead has only emphlogarithmic dependence on $M$ as long
arXiv Detail & Related papers (2022-02-12T21:55:44Z)
Logarithmic Regret from Sublinear Hints [76.87432703516942]
We show that an algorithm can obtain $O(log T)$ regret with just $O(sqrtT)$ hints under a natural query model. We also show that $o(sqrtT)$ hints cannot guarantee better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2021-11-09T16:50:18Z)
Differentially Private Multi-Armed Bandits in the Shuffle Model [58.22098764071924]
We give an $(varepsilon,delta)$-differentially private algorithm for the multi-armed bandit (MAB) problem in the shuffle model. Our upper bound almost matches the regret of the best known algorithms for the centralized model, and significantly outperforms the best known algorithm in the local model.
arXiv Detail & Related papers (2021-06-05T14:11:01Z)
Communication Efficient Parallel Reinforcement Learning [34.77250498401055]
We consider the problem where $M$ agents interact with $M$ identical and independent environments with $S$ states and $A$ actions. We aim to find an algorithm that allows the agents to minimize the regret with infrequent communication rounds.
arXiv Detail & Related papers (2021-02-22T02:46:36Z)
Impact of Representation Learning in Linear Bandits [83.17684841392754]
We study how representation learning can improve the efficiency of bandit problems. We present a new algorithm which achieves $widetildeO(TsqrtkN + sqrtdkNT)$ regret, where $N$ is the number of rounds we play for each bandit.
arXiv Detail & Related papers (2020-10-13T16:35:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.