Cooperative Multi-Agent Bandits with Heavy Tails
- URL: http://arxiv.org/abs/2008.06244v1
- Date: Fri, 14 Aug 2020 08:34:32 GMT
- Title: Cooperative Multi-Agent Bandits with Heavy Tails
- Authors: Abhimanyu Dubey and Alex Pentland
- Abstract summary: We study the heavy-tailed bandit problem in the cooperative multi-agent setting, where a group of agents interact with a common bandit problem.
Existing algorithms for the bandit in this setting utilize confidence intervals arising from an averaging-based communication protocol.
We propose textscMP-UCB, a decentralized multi-agent algorithm for the cooperative bandit that incorporates robust estimation with a message-passing protocol.
- Score: 15.609414012418043
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the heavy-tailed stochastic bandit problem in the cooperative
multi-agent setting, where a group of agents interact with a common bandit
problem, while communicating on a network with delays. Existing algorithms for
the stochastic bandit in this setting utilize confidence intervals arising from
an averaging-based communication protocol known as~\textit{running consensus},
that does not lend itself to robust estimation for heavy-tailed settings. We
propose \textsc{MP-UCB}, a decentralized multi-agent algorithm for the
cooperative stochastic bandit that incorporates robust estimation with a
message-passing protocol. We prove optimal regret bounds for \textsc{MP-UCB}
for several problem settings, and also demonstrate its superiority to existing
methods. Furthermore, we establish the first lower bounds for the cooperative
bandit problem, in addition to providing efficient algorithms for robust bandit
estimation of location.
Related papers
- Multi-Agent Best Arm Identification in Stochastic Linear Bandits [0.7673339435080443]
We study the problem of collaborative best-arm identification in linear bandits under a fixed-budget scenario.
In our learning model, we consider multiple agents connected through a star network or a generic network, interacting with a linear bandit instance in parallel.
We devise the algorithms MaLinBAI-Star and MaLinBAI-Gen for star networks and generic networks respectively.
arXiv Detail & Related papers (2024-11-20T20:09:44Z) - QuACK: A Multipurpose Queuing Algorithm for Cooperative $k$-Armed Bandits [5.530212768657544]
We study the cooperative $k$-armed bandit problem, where a network of $m$ agents collaborate to find the optimal action.
We provide a black-box reduction that allows us to extend any single-agent bandit algorithm to the multi-agent setting.
arXiv Detail & Related papers (2024-10-31T12:20:36Z) - Multi-Player Approaches for Dueling Bandits [58.442742345319225]
We show that the direct use of a Follow Your Leader black-box approach matches the lower bound for this setting.
We also analyze a message-passing fully distributed approach with a novel Condorcet-winner recommendation protocol.
arXiv Detail & Related papers (2024-05-25T10:25:48Z) - DASA: Delay-Adaptive Multi-Agent Stochastic Approximation [64.32538247395627]
We consider a setting in which $N$ agents aim to speedup a common Approximation problem by acting in parallel and communicating with a central server.
To mitigate the effect of delays and stragglers, we propose textttDASA, a Delay-Adaptive algorithm for multi-agent Approximation.
arXiv Detail & Related papers (2024-03-25T22:49:56Z) - Pure Exploration in Asynchronous Federated Bandits [57.02106627533004]
We study the federated pure exploration problem of multi-armed bandits and linear bandits, where $M$ agents cooperatively identify the best arm via communicating with the central server.
We propose the first asynchronous multi-armed bandit and linear bandit algorithms for pure exploration with fixed confidence.
arXiv Detail & Related papers (2023-10-17T06:04:00Z) - On Regret-optimal Cooperative Nonstochastic Multi-armed Bandits [7.23389716633927]
We show that a collaborative multi-agent emphfollow-the-regularized-leader (FTRL) algorithm has an individual regret upper bound that matches the lower bound up to a constant factor.
We also show that an FTRL algorithm with a suitable regularizer is regret optimal with respect to the scaling with the edge-delay parameter.
arXiv Detail & Related papers (2022-11-30T16:46:41Z) - Private and Byzantine-Proof Cooperative Decision-Making [15.609414012418043]
The cooperative bandit problem is a multi-agent decision problem involving a group of agents that interact simultaneously with a multi-armed bandit.
In this paper, we investigate the bandit problem under two settings - (a) when the agents wish to make their communication private with respect to the action sequence, and (b) when the agents can be byzantine.
We provide upper-confidence bound algorithms that obtain optimal regret while being (a) differentially-private and (b) private.
Our decentralized algorithms require no information about the network of connectivity between agents, making them scalable to large dynamic systems.
arXiv Detail & Related papers (2022-05-27T18:03:54Z) - Max-Min Grouped Bandits [48.62520520818357]
We introduce a multi-armed bandit problem termed max-min grouped bandits.
The goal is to find a group whose worst arm has the highest mean reward.
This problem is of interest to applications such as recommendation systems.
arXiv Detail & Related papers (2021-11-17T01:59:15Z) - Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit
Feedback [51.21673420940346]
Combinatorial bandits generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set.
We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set.
Based on a projection-free online learning algorithm for finite polytopes, it is the first computationally efficient algorithm which is convexally optimal and has competitive empirical performance.
arXiv Detail & Related papers (2021-01-21T10:35:09Z) - Upper Confidence Bounds for Combining Stochastic Bandits [52.10197476419621]
We provide a simple method to combine bandit algorithms.
Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem.
arXiv Detail & Related papers (2020-12-24T05:36:29Z) - Kernel Methods for Cooperative Multi-Agent Contextual Bandits [15.609414012418043]
Cooperative multi-agent decision making involves a group of agents cooperatively solving learning problems while communicating over a network with delays.
We consider the kernelised contextual bandit problem, where the reward obtained by an agent is an arbitrary linear function of the contexts' images in the related kernel reproducing Hilbert space (RKHS)
We propose textscCoop- KernelUCB, an algorithm that provides near-optimal bounds on the per-agent regret.
arXiv Detail & Related papers (2020-08-14T07:37:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.