Near-Optimal Collaborative Learning in Bandits
- URL: http://arxiv.org/abs/2206.00121v1
- Date: Tue, 31 May 2022 21:11:47 GMT
- Title: Near-Optimal Collaborative Learning in Bandits
- Authors: Cl\'emence R\'eda, Sattar Vakili, Emilie Kaufmann
- Abstract summary: This paper introduces a general multi-agent bandit model in which each agent is facing a finite set of arms.
The twist is that the optimal arm for each agent is the arm with largest expected mixed reward, where the mixed reward of an arm is a weighted sum of the rewards of this arm for all agents.
We propose a near-optimal algorithm for pure exploration.
- Score: 15.456561090871244
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces a general multi-agent bandit model in which each agent
is facing a finite set of arms and may communicate with other agents through a
central controller in order to identify, in pure exploration, or play, in
regret minimization, its optimal arm. The twist is that the optimal arm for
each agent is the arm with largest expected mixed reward, where the mixed
reward of an arm is a weighted sum of the rewards of this arm for all agents.
This makes communication between agents often necessary. This general setting
allows to recover and extend several recent models for collaborative bandit
learning, including the recently proposed federated learning with
personalization (Shi et al., 2021). In this paper, we provide new lower bounds
on the sample complexity of pure exploration and on the regret. We then propose
a near-optimal algorithm for pure exploration. This algorithm is based on
phased elimination with two novel ingredients: a data-dependent sampling scheme
within each phase, aimed at matching a relaxation of the lower bound.
Related papers
- Stochastic Bandits for Egalitarian Assignment [58.33714486693828]
We study EgalMAB, an egalitarian assignment problem in the context of multi-armed bandits.
We design and analyze a UCB-based policy EgalUCB and establish upper bounds on the cumulative regret.
arXiv Detail & Related papers (2024-10-08T09:49:47Z) - Best Arm Identification with Minimal Regret [55.831935724659175]
Best arm identification problem elegantly amalgamates regret minimization and BAI.
Agent's goal is to identify the best arm with a prescribed confidence level.
Double KL-UCB algorithm achieves optimality as the confidence level tends to zero.
arXiv Detail & Related papers (2024-09-27T16:46:02Z) - Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities [69.34646544774161]
We formulate a new variant of multi-player multi-armed bandit (MAB) model, which captures arrival of requests to each arm and the policy of allocating requests to players.
The challenge is how to design a distributed learning algorithm such that players select arms according to the optimal arm pulling profile.
We design an iterative distributed algorithm, which guarantees that players can arrive at a consensus on the optimal arm pulling profile in only M rounds.
arXiv Detail & Related papers (2024-08-20T13:57:00Z) - Optimal Regret Bounds for Collaborative Learning in Bandits [10.76667043339504]
We consider regret in a general collaborative multi-agent multi-armed bandit model.
We propose the first algorithm with order optimal regret bounds under this model.
arXiv Detail & Related papers (2023-12-15T10:36:13Z) - Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward.
We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy.
We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z) - Collaborative Multi-Agent Heterogeneous Multi-Armed Bandits [24.590517939890788]
We study a new collaborative setting, consisting of $N$ agents such that each agent is learning one of $M$ multi-armed bandits.
We develop algorithms which facilitate collaboration between the agents under two scenarios.
arXiv Detail & Related papers (2023-05-30T06:35:49Z) - On the Complexity of Multi-Agent Decision Making: From Learning in Games
to Partial Monitoring [105.13668993076801]
A central problem in the theory of multi-agent reinforcement learning (MARL) is to understand what structural conditions and algorithmic principles lead to sample-efficient learning guarantees.
We study this question in a general framework for interactive decision making with multiple agents.
We show that characterizing the statistical complexity for multi-agent decision making is equivalent to characterizing the statistical complexity of single-agent decision making.
arXiv Detail & Related papers (2023-05-01T06:46:22Z) - Doubly Adversarial Federated Bandits [7.23389716633927]
We study a new non-stochastic federated multi-armed bandit problem with multiple agents collaborating via a communication network.
Our algorithm gives a positive answer to an open question proposed in Cesa-Bianchi et al.
arXiv Detail & Related papers (2023-01-22T22:36:43Z) - Distributed Bandits with Heterogeneous Agents [38.90376765616447]
This paper tackles a multi-agent bandit setting where $M$ agents cooperate together to solve the same instance of a $K$-armed bandit problem.
We propose two learning algorithms, ucbo and AAE.
We prove that both algorithms achieve order-optimal regret, which is $Oleft(sum_i:tildeDelta_i>0 log T/tildeDelta_iright)$, where $tildeDelta_i$ is the minimum suboptimality gap between the reward mean of
arXiv Detail & Related papers (2022-01-23T20:04:15Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.