Doubly Adversarial Federated Bandits
- URL: http://arxiv.org/abs/2301.09223v2
- Date: Sat, 21 Oct 2023 14:55:29 GMT
- Title: Doubly Adversarial Federated Bandits
- Authors: Jialin Yi and Milan Vojnovi\'c
- Abstract summary: We study a new non-stochastic federated multi-armed bandit problem with multiple agents collaborating via a communication network.
Our algorithm gives a positive answer to an open question proposed in Cesa-Bianchi et al.
- Score: 7.23389716633927
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a new non-stochastic federated multi-armed bandit problem with
multiple agents collaborating via a communication network. The losses of the
arms are assigned by an oblivious adversary that specifies the loss of each arm
not only for each time step but also for each agent, which we call ``doubly
adversarial". In this setting, different agents may choose the same arm in the
same time step but observe different feedback. The goal of each agent is to
find a globally best arm in hindsight that has the lowest cumulative loss
averaged over all agents, which necessities the communication among agents. We
provide regret lower bounds for any federated bandit algorithm under different
settings, when agents have access to full-information feedback, or the bandit
feedback. For the bandit feedback setting, we propose a near-optimal federated
bandit algorithm called FEDEXP3. Our algorithm gives a positive answer to an
open question proposed in Cesa-Bianchi et al. (2016): FEDEXP3 can guarantee a
sub-linear regret without exchanging sequences of selected arm identities or
loss sequences among agents. We also provide numerical evaluations of our
algorithm to validate our theoretical results and demonstrate its effectiveness
on synthetic and real-world datasets
Related papers
- Stochastic Bandits for Egalitarian Assignment [58.33714486693828]
We study EgalMAB, an egalitarian assignment problem in the context of multi-armed bandits.
We design and analyze a UCB-based policy EgalUCB and establish upper bounds on the cumulative regret.
arXiv Detail & Related papers (2024-10-08T09:49:47Z) - Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities [69.34646544774161]
We formulate a new variant of multi-player multi-armed bandit (MAB) model, which captures arrival of requests to each arm and the policy of allocating requests to players.
The challenge is how to design a distributed learning algorithm such that players select arms according to the optimal arm pulling profile.
We design an iterative distributed algorithm, which guarantees that players can arrive at a consensus on the optimal arm pulling profile in only M rounds.
arXiv Detail & Related papers (2024-08-20T13:57:00Z) - Multi-Armed Bandits with Abstention [62.749500564313834]
We introduce a novel extension of the canonical multi-armed bandit problem that incorporates an additional strategic element: abstention.
In this enhanced framework, the agent is not only tasked with selecting an arm at each time step, but also has the option to abstain from accepting the instantaneous reward before observing it.
arXiv Detail & Related papers (2024-02-23T06:27:12Z) - Byzantine-Resilient Decentralized Multi-Armed Bandits [25.499420566469098]
We develop an algorithm that fuses an information mixing step among agents with a truncation of inconsistent and extreme values.
This framework can be used to model attackers in computer networks, instigators of offensive content into recommender systems, or manipulators of financial markets.
arXiv Detail & Related papers (2023-10-11T09:09:50Z) - Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward.
We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy.
We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z) - On Regret-optimal Cooperative Nonstochastic Multi-armed Bandits [7.23389716633927]
We show that a collaborative multi-agent emphfollow-the-regularized-leader (FTRL) algorithm has an individual regret upper bound that matches the lower bound up to a constant factor.
We also show that an FTRL algorithm with a suitable regularizer is regret optimal with respect to the scaling with the edge-delay parameter.
arXiv Detail & Related papers (2022-11-30T16:46:41Z) - Near-Optimal Collaborative Learning in Bandits [15.456561090871244]
This paper introduces a general multi-agent bandit model in which each agent is facing a finite set of arms.
The twist is that the optimal arm for each agent is the arm with largest expected mixed reward, where the mixed reward of an arm is a weighted sum of the rewards of this arm for all agents.
We propose a near-optimal algorithm for pure exploration.
arXiv Detail & Related papers (2022-05-31T21:11:47Z) - Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit
Feedback [51.21673420940346]
Combinatorial bandits generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set.
We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set.
Based on a projection-free online learning algorithm for finite polytopes, it is the first computationally efficient algorithm which is convexally optimal and has competitive empirical performance.
arXiv Detail & Related papers (2021-01-21T10:35:09Z) - A Novel Confidence-Based Algorithm for Structured Bandits [129.30402124516507]
We study finite-armed bandits where the rewards of each arm might be correlated to those of other arms.
We introduce a novel phased algorithm that exploits the given structure to build confidence sets over the parameters of the true bandit problem.
arXiv Detail & Related papers (2020-05-23T19:52:44Z) - The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits [20.259428328004738]
We consider a decentralized multi-agent Multi Armed Bandit (MAB) setup consisting of $N$ agents.
In our model, agents collaborate by exchanging messages through pairwise gossip style communications on an arbitrary connected graph.
arXiv Detail & Related papers (2020-01-15T17:49:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.