The Pareto Frontier of Instance-Dependent Guarantees in Multi-Player
Multi-Armed Bandits with no Communication
- URL: http://arxiv.org/abs/2202.09653v1
- Date: Sat, 19 Feb 2022 18:19:36 GMT
- Title: The Pareto Frontier of Instance-Dependent Guarantees in Multi-Player
Multi-Armed Bandits with no Communication
- Authors: Allen Liu, Mark Sellke
- Abstract summary: We study the multi-player multi-armed bandit problem.
In this problem, $m$ players cooperate to maximize their total reward from $K > m$ arms.
We ask whether it is possible to obtain optimal instance-dependent regret $tildeO (1/Delta)$ where $Delta$ is the gap between the $m$-th and $m+1$-st best arms.
- Score: 10.446001329147112
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the stochastic multi-player multi-armed bandit problem. In this
problem, $m$ players cooperate to maximize their total reward from $K > m$
arms. However the players cannot communicate and are penalized (e.g. receive no
reward) if they pull the same arm at the same time. We ask whether it is
possible to obtain optimal instance-dependent regret $\tilde{O}(1/\Delta)$
where $\Delta$ is the gap between the $m$-th and $m+1$-st best arms. Such
guarantees were recently achieved in a model allowing the players to implicitly
communicate through intentional collisions. We show that with no communication
at all, such guarantees are, surprisingly, not achievable. In fact, obtaining
the optimal $\tilde{O}(1/\Delta)$ regret for some regimes of $\Delta$
necessarily implies strictly sub-optimal regret in other regimes. Our main
result is a complete characterization of the Pareto optimal instance-dependent
trade-offs that are possible with no communication. Our algorithm generalizes
that of Bubeck, Budzinski, and the second author and enjoys the same strong
no-collision property, while our lower bound is based on a topological
obstruction and holds even under full information.
Related papers
- Heterogeneous Multi-Player Multi-Armed Bandits Robust To Adversarial Attacks [19.184883255588126]
We consider a multi-player bandit setting in the presence of adversaries that attempt to negatively affect the rewards received by the players in the system.
In the event of a collision (more than one player choosing the same arm), all the colliding users receive zero rewards.
The adversaries use collisions to affect the rewards received by the players, i.e., if an adversary attacks an arm, any player choosing that arm will receive zero reward.
arXiv Detail & Related papers (2025-01-21T08:51:23Z) - Corrupted Learning Dynamics in Games [62.73758165845971]
An equilibrium can be computed at a fast rate of $O(log T)$ when all players follow the optimistic follow-the-regularized-leader (OFTRL)
We present corrupted learning dynamics that adaptively find an equilibrium at a rate that depends on the extent to which each player deviates from the strategy suggested by the prescribed algorithm.
arXiv Detail & Related papers (2024-12-10T02:23:44Z) - Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities [69.34646544774161]
We formulate a new variant of multi-player multi-armed bandit (MAB) model, which captures arrival of requests to each arm and the policy of allocating requests to players.
The challenge is how to design a distributed learning algorithm such that players select arms according to the optimal arm pulling profile.
We design an iterative distributed algorithm, which guarantees that players can arrive at a consensus on the optimal arm pulling profile in only M rounds.
arXiv Detail & Related papers (2024-08-20T13:57:00Z) - Provably Efficient Generalized Lagrangian Policy Optimization for Safe
Multi-Agent Reinforcement Learning [105.7510838453122]
We examine online safe multi-agent reinforcement learning using constrained Markov games.
We develop an upper confidence reinforcement learning algorithm to solve this Lagrangian problem.
Our algorithm updates the minimax decision primal variables via online mirror descent and the dual variable via projected gradient step.
arXiv Detail & Related papers (2023-05-31T22:09:24Z) - Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits [6.732901486505047]
Multi-player multi-armed bandit is an increasingly relevant decision-making problem, motivated by applications to cognitive radio systems.
This paper proposes a textitmulti-player multi-armed walking bandits model, aiming to address aforementioned modeling issues.
arXiv Detail & Related papers (2022-12-12T23:26:02Z) - Bandits with many optimal arms [68.17472536610859]
We write $p*$ for the proportion of optimal arms and $Delta$ for the minimal mean-gap between optimal and sub-optimal arms.
We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting.
arXiv Detail & Related papers (2021-03-23T11:02:31Z) - Combinatorial Bandits without Total Order for Arms [52.93972547896022]
We present a reward model that captures set-dependent reward distribution and assumes no total order for arms.
We develop a novel regret analysis and show an $Oleft(frack2 n log Tepsilonright)$ gap-dependent regret bound as well as an $Oleft(k2sqrtn T log Tright)$ gap-independent regret bound.
arXiv Detail & Related papers (2021-03-03T23:08:59Z) - Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits
with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints.
For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z) - Cooperative and Stochastic Multi-Player Multi-Armed Bandit: Optimal
Regret With Neither Communication Nor Collisions [4.974932889340056]
We consider the cooperative multi-player version of the multi-armed bandit problem.
We show that these properties are achievable for any number of players and arms.
arXiv Detail & Related papers (2020-11-08T03:14:19Z) - Multitask Bandit Learning Through Heterogeneous Feedback Aggregation [35.923544685900055]
We formulate the problem as the $epsilon$-multi-player multi-armed bandit problem, in which a set of players concurrently interact with a set of arms.
We develop an upper confidence bound-based algorithm, RobustAgg$(epsilon)$, that adaptively aggregates rewards collected by different players.
arXiv Detail & Related papers (2020-10-29T07:13:28Z) - Selfish Robustness and Equilibria in Multi-Player Bandits [25.67398941667429]
In a game, several players simultaneously pull arms and encounter a collision - with 0 reward - if some of them pull the same arm at the same time.
While the cooperative case where players maximize the collective reward has been mostly considered, to malicious players is a crucial and challenging concern.
We shall consider instead the more natural class of selfish players whose incentives are to maximize their individual rewards, potentially at the expense of the social welfare.
arXiv Detail & Related papers (2020-02-04T09:50:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.