Multi-Agent Best Arm Identification in Stochastic Linear Bandits
- URL: http://arxiv.org/abs/2411.13690v1
- Date: Wed, 20 Nov 2024 20:09:44 GMT
- Title: Multi-Agent Best Arm Identification in Stochastic Linear Bandits
- Authors: Sanjana Agrawal, Saúl A. Blanco,
- Abstract summary: We study the problem of collaborative best-arm identification in linear bandits under a fixed-budget scenario.
In our learning model, we consider multiple agents connected through a star network or a generic network, interacting with a linear bandit instance in parallel.
We devise the algorithms MaLinBAI-Star and MaLinBAI-Gen for star networks and generic networks respectively.
- Score: 0.7673339435080443
- License:
- Abstract: We study the problem of collaborative best-arm identification in stochastic linear bandits under a fixed-budget scenario. In our learning model, we consider multiple agents connected through a star network or a generic network, interacting with a linear bandit instance in parallel. The objective of the agents is to collaboratively learn the best arm of the given bandit instance with the help of a central server while minimizing the probability of error in best arm estimation. For this purpose, we devise the algorithms MaLinBAI-Star and MaLinBAI-Gen for star networks and generic networks respectively. Both algorithms employ an Upper-Confidence-Bound approach where agents share their knowledge through the central server during each communication round. We demonstrate, both theoretically and empirically, that our algorithms enjoy exponentially decaying probability of error in the allocated time budget. Furthermore, experimental results based on synthetic and real-world data validate the effectiveness of our algorithms over the existing multi-agent algorithms.
Related papers
- Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Pure Exploration in Asynchronous Federated Bandits [57.02106627533004]
We study the federated pure exploration problem of multi-armed bandits and linear bandits, where $M$ agents cooperatively identify the best arm via communicating with the central server.
We propose the first asynchronous multi-armed bandit and linear bandit algorithms for pure exploration with fixed confidence.
arXiv Detail & Related papers (2023-10-17T06:04:00Z) - Clustered Multi-Agent Linear Bandits [5.893124686141782]
We address a particular instance of the multi-agent linear bandit problem, called clustered multi-agent linear bandits.
We propose a novel algorithm leveraging an efficient collaboration between the agents in order to accelerate the overall optimization problem.
arXiv Detail & Related papers (2023-09-15T19:01:42Z) - Federated Learning for Heterogeneous Bandits with Unobserved Contexts [0.0]
We study the problem of federated multi-arm contextual bandits with unknown contexts.
We propose an elimination-based algorithm and prove the regret bound for linearly parametrized reward functions.
arXiv Detail & Related papers (2023-03-29T22:06:24Z) - Communication-Efficient Collaborative Best Arm Identification [6.861971769602314]
We investigate top-$m$ arm identification, a basic problem in bandit theory, in a multi-agent learning model in which agents collaborate to learn an objective function.
We are interested in designing collaborative learning algorithms that achieve maximum speedup.
arXiv Detail & Related papers (2022-08-18T19:02:29Z) - Byzantine-Robust Online and Offline Distributed Reinforcement Learning [60.970950468309056]
We consider a distributed reinforcement learning setting where multiple agents explore the environment and communicate their experiences through a central server.
$alpha$-fraction of agents are adversarial and can report arbitrary fake information.
We seek to identify a near-optimal policy for the underlying Markov decision process in the presence of these adversarial agents.
arXiv Detail & Related papers (2022-06-01T00:44:53Z) - Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback.
It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines.
We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z) - Detection of Insider Attacks in Distributed Projected Subgradient
Algorithms [11.096339082411882]
We show that a general neural network is particularly suitable for detecting and localizing malicious agents.
We propose to adopt one of the state-of-art approaches in federated learning, i.e., a collaborative peer-to-peer machine learning protocol.
In our simulations, a least-squared problem is considered to verify the feasibility and effectiveness of AI-based methods.
arXiv Detail & Related papers (2021-01-18T08:01:06Z) - A Low Complexity Decentralized Neural Net with Centralized Equivalence
using Layer-wise Learning [49.15799302636519]
We design a low complexity decentralized learning algorithm to train a recently proposed large neural network in distributed processing nodes (workers)
In our setup, the training data is distributed among the workers but is not shared in the training process due to privacy and security concerns.
We show that it is possible to achieve equivalent learning performance as if the data is available in a single place.
arXiv Detail & Related papers (2020-09-29T13:08:12Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.