Related papers: Multi-Agent Best Arm Identification in Stochastic Linear Bandits

Multi-Agent Best Arm Identification in Stochastic Linear Bandits

URL: http://arxiv.org/abs/2411.13690v1
Date: Wed, 20 Nov 2024 20:09:44 GMT
Title: Multi-Agent Best Arm Identification in Stochastic Linear Bandits
Authors: Sanjana Agrawal, Saúl A. Blanco,
Abstract summary: We study the problem of collaborative best-arm identification in linear bandits under a fixed-budget scenario. In our learning model, we consider multiple agents connected through a star network or a generic network, interacting with a linear bandit instance in parallel. We devise the algorithms MaLinBAI-Star and MaLinBAI-Gen for star networks and generic networks respectively.
Score: 0.7673339435080443
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We study the problem of collaborative best-arm identification in stochastic linear bandits under a fixed-budget scenario. In our learning model, we consider multiple agents connected through a star network or a generic network, interacting with a linear bandit instance in parallel. The objective of the agents is to collaboratively learn the best arm of the given bandit instance with the help of a central server while minimizing the probability of error in best arm estimation. For this purpose, we devise the algorithms MaLinBAI-Star and MaLinBAI-Gen for star networks and generic networks respectively. Both algorithms employ an Upper-Confidence-Bound approach where agents share their knowledge through the central server during each communication round. We demonstrate, both theoretically and empirically, that our algorithms enjoy exponentially decaying probability of error in the allocated time budget. Furthermore, experimental results based on synthetic and real-world data validate the effectiveness of our algorithms over the existing multi-agent algorithms.

Related papers

Asymptotically Optimal Linear Best Feasible Arm Identification with Fixed Budget [55.938644481736446]
We introduce a novel algorithm for best feasible arm identification that guarantees an exponential decay in the error probability.<n>We validate our algorithm through comprehensive empirical evaluations across various problem instances with different levels of complexity.
arXiv Detail & Related papers (2025-06-03T02:56:26Z)
Scalable Decentralized Algorithms for Online Personalized Mean Estimation [12.002609934938224]
This study focuses on a simplified version of the overarching problem, where each agent collects samples from a real-valued distribution over time to estimate its mean.<n>We introduce two collaborative mean estimation algorithms: one draws inspiration from belief propagation, while the other employs a consensus-based approach.
arXiv Detail & Related papers (2024-02-20T08:30:46Z)
Interacting Particle Systems on Networks: joint inference of the network and the interaction kernel [8.535430501710712]
We infer the weight matrix of the network and systems which determine the rules of the interactions between agents. We use two algorithms: one is on a new algorithm named operator regression with alternating least squares of data. Both algorithms are scalable conditions guaranteeing identifiability and well-posedness.
arXiv Detail & Related papers (2024-02-13T12:29:38Z)
Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms. In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z)
Pure Exploration in Asynchronous Federated Bandits [57.02106627533004]
We study the federated pure exploration problem of multi-armed bandits and linear bandits, where $M$ agents cooperatively identify the best arm via communicating with the central server. We propose the first asynchronous multi-armed bandit and linear bandit algorithms for pure exploration with fixed confidence.
arXiv Detail & Related papers (2023-10-17T06:04:00Z)
Clustered Multi-Agent Linear Bandits [5.893124686141782]
We address a particular instance of the multi-agent linear bandit problem, called clustered multi-agent linear bandits. We propose a novel algorithm leveraging an efficient collaboration between the agents in order to accelerate the overall optimization problem.
arXiv Detail & Related papers (2023-09-15T19:01:42Z)
Federated Learning for Heterogeneous Bandits with Unobserved Contexts [0.0]
We study the problem of federated multi-arm contextual bandits with unknown contexts. We propose an elimination-based algorithm and prove the regret bound for linearly parametrized reward functions.
arXiv Detail & Related papers (2023-03-29T22:06:24Z)
Communication-Efficient Collaborative Best Arm Identification [6.861971769602314]
We investigate top-$m$ arm identification, a basic problem in bandit theory, in a multi-agent learning model in which agents collaborate to learn an objective function. We are interested in designing collaborative learning algorithms that achieve maximum speedup.
arXiv Detail & Related papers (2022-08-18T19:02:29Z)
On the Convergence of Distributed Stochastic Bilevel Optimization Algorithms over a Network [55.56019538079826]
Bilevel optimization has been applied to a wide variety of machine learning models. Most existing algorithms restrict their single-machine setting so that they are incapable of handling distributed data. We develop novel decentralized bilevel optimization algorithms based on a gradient tracking communication mechanism and two different gradients.
arXiv Detail & Related papers (2022-06-30T05:29:52Z)
Byzantine-Robust Online and Offline Distributed Reinforcement Learning [60.970950468309056]
We consider a distributed reinforcement learning setting where multiple agents explore the environment and communicate their experiences through a central server. $alpha$-fraction of agents are adversarial and can report arbitrary fake information. We seek to identify a near-optimal policy for the underlying Markov decision process in the presence of these adversarial agents.
arXiv Detail & Related papers (2022-06-01T00:44:53Z)
Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback. It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines. We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z)
Detection of Insider Attacks in Distributed Projected Subgradient Algorithms [11.096339082411882]
We show that a general neural network is particularly suitable for detecting and localizing malicious agents. We propose to adopt one of the state-of-art approaches in federated learning, i.e., a collaborative peer-to-peer machine learning protocol. In our simulations, a least-squared problem is considered to verify the feasibility and effectiveness of AI-based methods.
arXiv Detail & Related papers (2021-01-18T08:01:06Z)
A Low Complexity Decentralized Neural Net with Centralized Equivalence using Layer-wise Learning [49.15799302636519]
We design a low complexity decentralized learning algorithm to train a recently proposed large neural network in distributed processing nodes (workers) In our setup, the training data is distributed among the workers but is not shared in the training process due to privacy and security concerns. We show that it is possible to achieve equivalent learning performance as if the data is available in a single place.
arXiv Detail & Related papers (2020-09-29T13:08:12Z)
Kernel Methods for Cooperative Multi-Agent Contextual Bandits [15.609414012418043]
Cooperative multi-agent decision making involves a group of agents cooperatively solving learning problems while communicating over a network with delays. We consider the kernelised contextual bandit problem, where the reward obtained by an agent is an arbitrary linear function of the contexts' images in the related kernel reproducing Hilbert space (RKHS) We propose textscCoop- KernelUCB, an algorithm that provides near-optimal bounds on the per-agent regret.
arXiv Detail & Related papers (2020-08-14T07:37:44Z)
Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds. Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z)
Decentralized MCTS via Learned Teammate Models [89.24858306636816]
We present a trainable online decentralized planning algorithm based on decentralized Monte Carlo Tree Search. We show that deep learning and convolutional neural networks can be employed to produce accurate policy approximators.
arXiv Detail & Related papers (2020-03-19T13:10:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.