Near Optimal Best Arm Identification for Clustered Bandits
- URL: http://arxiv.org/abs/2505.10147v1
- Date: Thu, 15 May 2025 10:20:26 GMT
- Title: Near Optimal Best Arm Identification for Clustered Bandits
- Authors: Yash, Nikhil Karamchandani, Avishek Ghosh,
- Abstract summary: We consider $M$ agents grouped into $M$ clusters, where each cluster solves a bandit problem.<n>We propose two novel algorithms: Clustering then Best Arm Identification (Cl-BAI) and Best Arm Identification then Clustering (BAI-Cl)<n>Cl-BAI uses a two-phase approach that first clusters agents based on the bandit problems they are learning, followed by identifying the best arm for each cluster.<n>BAI-Cl reverses the sequence by identifying the best arms first and then clustering agents accordingly.
- Score: 10.146588539113614
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work investigates the problem of best arm identification for multi-agent multi-armed bandits. We consider $N$ agents grouped into $M$ clusters, where each cluster solves a stochastic bandit problem. The mapping between agents and bandits is a priori unknown. Each bandit is associated with $K$ arms, and the goal is to identify the best arm for each agent under a $\delta$-probably correct ($\delta$-PC) framework, while minimizing sample complexity and communication overhead. We propose two novel algorithms: Clustering then Best Arm Identification (Cl-BAI) and Best Arm Identification then Clustering (BAI-Cl). Cl-BAI uses a two-phase approach that first clusters agents based on the bandit problems they are learning, followed by identifying the best arm for each cluster. BAI-Cl reverses the sequence by identifying the best arms first and then clustering agents accordingly. Both algorithms leverage the successive elimination framework to ensure computational efficiency and high accuracy. We establish $\delta$-PC guarantees for both methods, derive bounds on their sample complexity, and provide a lower bound for this problem class. Moreover, when $M$ is small (a constant), we show that the sample complexity of a variant of BAI-Cl is minimax optimal in an order-wise sense. Experiments on synthetic and real-world datasets (MovieLens, Yelp) demonstrate the superior performance of the proposed algorithms in terms of sample and communication efficiency, particularly in settings where $M \ll N$.
Related papers
- Heterogeneous Multi-Agent Bandits with Parsimonious Hints [12.709437751500353]
We study a hinted heterogeneous multi-agent multi-armed bandits problem (HMA2B), where agents can query low-cost observations (hints) in addition to pulling arms.<n>In this framework, each of the $M$ agents has a unique reward distribution over $K$ arms, and in $T$ rounds, they can observe the reward of the arm they pull only if no other agent pulls that arm.<n>The goal is to maximize the total utility by querying the minimal necessary hints without pulling arms, achieving time-independent regret.
arXiv Detail & Related papers (2025-02-22T07:46:41Z) - An Algorithm for Fixed Budget Best Arm Identification with Combinatorial Exploration [3.9901365062418312]
We consider the best arm identification problem in the $K-$armed bandit framework.<n>Agent is allowed to play a subset of arms at each time slot instead of one arm.<n>We propose an algorithm that constructs $log K$ groups and performs a likelihood ratio test to detect the presence of the best arm.
arXiv Detail & Related papers (2025-02-03T15:10:08Z) - Online Clustering with Bandit Information [5.024813922014978]
We study the problem of online clustering within the multi-armed bandit framework under the fixed confidence setting.<n>We introduce a novel algorithm, Average Tracking Bandit Online Clustering (ATBOC), and prove that it is order optimal.<n>We propose a more efficient algorithm, Lower and Upper Confidence Bound-based Bandit Online Clustering (LUCBBOC), inspired by the LUCB algorithm for best arm identification.
arXiv Detail & Related papers (2025-01-20T11:39:09Z) - Representative Arm Identification: A fixed confidence approach to identify cluster representatives [7.459521930846415]
We study the representative arm identification problem in the multi-armed bandits (MAB) framework.
The RAI problem covers as special cases several well-studied MAB problems such as identifying the best arm or any $M$ out of the top $K$.
We propose two algorithms, based on the idea of confidence intervals, and provide high probability upper bounds on their sample complexity.
arXiv Detail & Related papers (2024-08-26T11:47:52Z) - Optimal Multi-Fidelity Best-Arm Identification [65.23078799972188]
In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible.
We study multi-fidelity best-arm identification, in which the can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost.
Several methods have been proposed for tackling this problem, but their optimality remain elusive, notably due to loose lower bounds on the total cost needed to identify the best arm.
arXiv Detail & Related papers (2024-06-05T08:02:40Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - Optimal Algorithms for Latent Bandits with Cluster Structure [50.44722775727619]
We consider the problem of latent bandits with cluster structure where there are multiple users, each with an associated multi-armed bandit problem.
We propose LATTICE which allows exploitation of the latent cluster structure to provide the minimax optimal regret of $widetildeO(sqrt(mathsfM+mathsfN)mathsfT.
arXiv Detail & Related papers (2023-01-17T17:49:04Z) - On the Sample Complexity of Representation Learning in Multi-task
Bandits with Global and Local structure [77.60508571062958]
We investigate the sample complexity of learning the optimal arm for multi-task bandit problems.
Arms consist of two components: one that is shared across tasks (that we call representation) and one that is task-specific (that we call predictor)
We devise an algorithm OSRL-SC whose sample complexity approaches the lower bound, and scales at most as $H(Glog(delta_G)+ Xlog(delta_H))$, with $X,G,H$ being, respectively, the number of tasks, representations and predictors.
arXiv Detail & Related papers (2022-11-28T08:40:12Z) - Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback.
It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines.
We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z) - Distributed Bandits with Heterogeneous Agents [38.90376765616447]
This paper tackles a multi-agent bandit setting where $M$ agents cooperate together to solve the same instance of a $K$-armed bandit problem.
We propose two learning algorithms, ucbo and AAE.
We prove that both algorithms achieve order-optimal regret, which is $Oleft(sum_i:tildeDelta_i>0 log T/tildeDelta_iright)$, where $tildeDelta_i$ is the minimum suboptimality gap between the reward mean of
arXiv Detail & Related papers (2022-01-23T20:04:15Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.