An Instance-Dependent Analysis for the Cooperative Multi-Player
Multi-Armed Bandit
- URL: http://arxiv.org/abs/2111.04873v1
- Date: Mon, 8 Nov 2021 23:38:47 GMT
- Title: An Instance-Dependent Analysis for the Cooperative Multi-Player
Multi-Armed Bandit
- Authors: Aldo Pacchiano, Peter Bartlett, Michael I. Jordan
- Abstract summary: We study the problem of information sharing and cooperation in Multi-Player Multi-Armed bandits.
First, we show that a simple modification to a successive elimination strategy can be used to allow the players to estimate their suboptimality gaps.
Second, we leverage the first result to design a communication protocol that successfully uses the small reward of collisions to coordinate among players.
- Score: 93.97385339354318
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of information sharing and cooperation in Multi-Player
Multi-Armed bandits. We propose the first algorithm that achieves logarithmic
regret for this problem. Our results are based on two innovations. First, we
show that a simple modification to a successive elimination strategy can be
used to allow the players to estimate their suboptimality gaps, up to constant
factors, in the absence of collisions. Second, we leverage the first result to
design a communication protocol that successfully uses the small reward of
collisions to coordinate among players, while preserving meaningful
instance-dependent logarithmic regret guarantees.
Related papers
- Multi-Player Approaches for Dueling Bandits [58.442742345319225]
We show that the direct use of a Follow Your Leader black-box approach matches the lower bound for this setting.
We also analyze a message-passing fully distributed approach with a novel Condorcet-winner recommendation protocol.
arXiv Detail & Related papers (2024-05-25T10:25:48Z) - Leading the Pack: N-player Opponent Shaping [52.682734939786464]
We extend Opponent Shaping (OS) methods to environments involving multiple co-players and multiple shaping agents.
We find that when playing with a large number of co-players, OS methods' relative performance reduces, suggesting that in the limit OS methods may not perform well.
arXiv Detail & Related papers (2023-12-19T20:01:42Z) - Versatile Dueling Bandits: Best-of-both-World Analyses for Online
Learning from Preferences [28.79598714109439]
We study the problem of $K$-armed dueling bandit for both and adversarial environments.
We first propose a novel reduction from any (general) dueling bandits to multi-armed bandits.
Our algorithm is also the first to achieve an optimal $O(sum_i = 1K fraclog TDelta_i)$ regret bound against the Condorcet-winner benchmark.
arXiv Detail & Related papers (2022-02-14T13:37:23Z) - No-Regret Learning in Time-Varying Zero-Sum Games [99.86860277006318]
Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning.
We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures.
Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
arXiv Detail & Related papers (2022-01-30T06:10:04Z) - Bias-Robust Bayesian Optimization via Dueling Bandit [57.82422045437126]
We consider Bayesian optimization in settings where observations can be adversarially biased.
We propose a novel approach for dueling bandits based on information-directed sampling (IDS)
Thereby, we obtain the first efficient kernelized algorithm for dueling bandits that comes with cumulative regret guarantees.
arXiv Detail & Related papers (2021-05-25T10:08:41Z) - Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit
Feedback [51.21673420940346]
Combinatorial bandits generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set.
We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set.
Based on a projection-free online learning algorithm for finite polytopes, it is the first computationally efficient algorithm which is convexally optimal and has competitive empirical performance.
arXiv Detail & Related papers (2021-01-21T10:35:09Z) - Cooperative and Stochastic Multi-Player Multi-Armed Bandit: Optimal
Regret With Neither Communication Nor Collisions [4.974932889340056]
We consider the cooperative multi-player version of the multi-armed bandit problem.
We show that these properties are achievable for any number of players and arms.
arXiv Detail & Related papers (2020-11-08T03:14:19Z) - Multitask Bandit Learning Through Heterogeneous Feedback Aggregation [35.923544685900055]
We formulate the problem as the $epsilon$-multi-player multi-armed bandit problem, in which a set of players concurrently interact with a set of arms.
We develop an upper confidence bound-based algorithm, RobustAgg$(epsilon)$, that adaptively aggregates rewards collected by different players.
arXiv Detail & Related papers (2020-10-29T07:13:28Z) - Selfish Robustness and Equilibria in Multi-Player Bandits [25.67398941667429]
In a game, several players simultaneously pull arms and encounter a collision - with 0 reward - if some of them pull the same arm at the same time.
While the cooperative case where players maximize the collective reward has been mostly considered, to malicious players is a crucial and challenging concern.
We shall consider instead the more natural class of selfish players whose incentives are to maximize their individual rewards, potentially at the expense of the social welfare.
arXiv Detail & Related papers (2020-02-04T09:50:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.