Closing the Gap on the Sample Complexity of 1-Identification
- URL: http://arxiv.org/abs/2601.15620v1
- Date: Thu, 22 Jan 2026 03:50:31 GMT
- Title: Closing the Gap on the Sample Complexity of 1-Identification
- Authors: Zitian Li, Wang Chi Cheung,
- Abstract summary: 1-identification is a fundamental multi-armed bandit formulation on pure exploration.<n>We derive a new lower bound of $mathbbE$ when there is at least one qualified arm.<n>Our result complements the analysis of $mathbbE$ when there are multiple qualified arms.
- Score: 8.450904497835262
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 1-identification is a fundamental multi-armed bandit formulation on pure exploration. An agent aims to determine whether there exists a qualified arm whose mean reward is not less than a known threshold $μ_0$, or to output \textsf{None} if it believes such an arm does not exist. The agent needs to guarantee its output is correct with probability at least $1-δ$, while making expected total pulling times $\mathbb{E}τ$ as small as possible. We work on 1-identification with two main contributions. (1) We utilize an optimization formulation to derive a new lower bound of $\mathbb{E}τ$, when there is at least one qualified arm. (2) We design a new algorithm, deriving tight upper bounds whose gap to lower bounds are up to a polynomial of logarithm factor across all problem instance. Our result complements the analysis of $\mathbb{E}τ$ when there are multiple qualified arms, which is an open problem left by history literature.
Related papers
- Multiple-play Stochastic Bandits with Prioritized Arm Capacity Sharing [52.124267908936396]
The model is composed of $M$ arms and $K$ plays.<n>Each arm has a number of capacities, and each unit of capacity is associated with a reward function.<n>When multiple plays compete for the arm capacity, the arm capacity is allocated in a larger priority weight first manner.
arXiv Detail & Related papers (2025-12-25T11:19:09Z) - Near Optimal Non-asymptotic Sample Complexity of 1-Identification [5.279257531335345]
We study the 1-identification problem, a fundamental multi-armed bandit formulation on pure exploration.<n>The goal is to determine whether there exists an arm whose mean reward is at least a known threshold $mu_0$, or to output None if it believes such an arm does not exist.<n>We design a new algorithm, and conduct theoretical analysis from the non-asymptotic perspective.
arXiv Detail & Related papers (2025-06-08T03:23:45Z) - Online Fair Division for Personalized $2$-Value Instances [51.278096593080456]
We study an online fair division setting, where goods arrive one at a time and there is a fixed set of $n$ agents.<n>Once a good appears, the value each agent has for it is revealed and it must be allocated immediately and irrevocably to one of the agents.<n>We show how to obtain worst case guarantees with respect to well-known fairness notions.
arXiv Detail & Related papers (2025-05-28T09:48:16Z) - Multi-thresholding Good Arm Identification with Bandit Feedback [1.079960007119637]
We consider a good arm identification problem in a bandit setting with multi-objectives.<n>We propose the Multi-Thresholding UCB(MultiTUCB) algorithm with a sample complexity bound.
arXiv Detail & Related papers (2025-03-13T14:04:04Z) - Heterogeneous Multi-Agent Bandits with Parsimonious Hints [12.709437751500353]
We study a hinted heterogeneous multi-agent multi-armed bandits problem (HMA2B), where agents can query low-cost observations (hints) in addition to pulling arms.<n>In this framework, each of the $M$ agents has a unique reward distribution over $K$ arms, and in $T$ rounds, they can observe the reward of the arm they pull only if no other agent pulls that arm.<n>The goal is to maximize the total utility by querying the minimal necessary hints without pulling arms, achieving time-independent regret.
arXiv Detail & Related papers (2025-02-22T07:46:41Z) - Towards a Sharp Analysis of Offline Policy Learning for $f$-Divergence-Regularized Contextual Bandits [49.96531901205305]
We analyze $f$-divergence-regularized offline policy learning.<n>For reverse Kullback-Leibler (KL) divergence, we give the first $tildeO(epsilon-1)$ sample complexity under single-policy concentrability.<n>We extend our analysis to dueling bandits, and we believe these results take a significant step toward a comprehensive understanding of $f$-divergence-regularized policy learning.
arXiv Detail & Related papers (2025-02-09T22:14:45Z) - Optimal Multi-Objective Best Arm Identification with Fixed Confidence [62.36929749450298]
We consider a multi-armed bandit setting in which each arm yields an $M$-dimensional vector reward upon selection.<n>The end goal is to identify the best arm of em every objective in the shortest (expected) time subject to an upper bound on the probability of error.<n>We propose an algorithm that uses the novel idea of em surrogate proportions to sample the arms at each time step, eliminating the need to solve the max-min optimisation problem at each step.
arXiv Detail & Related papers (2025-01-23T12:28:09Z) - A General Framework for Clustering and Distribution Matching with Bandit Feedback [81.50716021326194]
We develop a general framework for clustering and distribution matching problems with bandit feedback.<n>We derive a non-asymptotic lower bound on the average number of arm pulls for any online algorithm with an error probability not exceeding $delta$.<n>Our refined analysis uncovers a novel bound on the speed at which the average number of arm pulls of our algorithm converges to the fundamental limit as $delta$ vanishes.
arXiv Detail & Related papers (2024-09-08T12:19:12Z) - Optimal Top-Two Method for Best Arm Identification and Fluid Analysis [15.353009236788262]
We propose an optimal top-2 type algorithm for the best arm identification problem.<n>We show that the proposed algorithm is optimal as $delta rightarrow 0$.
arXiv Detail & Related papers (2024-03-14T06:14:07Z) - Online Learning with Adversaries: A Differential-Inclusion Analysis [52.43460995467893]
We introduce an observation-matrix-based framework for fully asynchronous online Federated Learning with adversaries.
Our main result is that the proposed algorithm almost surely converges to the desired mean $mu.$
We derive this convergence using a novel differential-inclusion-based two-timescale analysis.
arXiv Detail & Related papers (2023-04-04T04:32:29Z) - Max-Quantile Grouped Infinite-Arm Bandits [39.7239093424124]
bandit problem in which there are a number of groups each consisting of infinitely many arms.
We introduce a two-step algorithm that first requests a fixed number of arms from each group and then runs a finite-arm grouped max-quantile bandit algorithm.
We characterize both the instance-dependent and worst-case regret, and provide a matching lower bound for the latter.
arXiv Detail & Related papers (2022-10-04T00:46:49Z) - Best-Arm Identification in Correlated Multi-Armed Bandits [9.180690233707645]
We propose a novel correlated bandit framework that captures domain knowledge about correlation between arms.
We show that the total samples obtained by C-LUCB are of the form $mathcalOleft(sum_k in mathcalC logleft(frac1deltaright)$ as opposed to the typical $mathcalOleft(sum_k in mathcalC logleft(frac1deltaright)$ samples required in the independent reward setting
arXiv Detail & Related papers (2021-09-10T15:41:07Z) - Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy [71.17938026619068]
We study the top-$k$ extreme contextual bandits problem, where the total number of arms can be enormous.
We first propose an algorithm for the non-extreme realizable setting, utilizing the Inverse Gap Weighting strategy.
We show that our algorithm has a regret guarantee of $O(ksqrt(A-k+1)T log (|mathcalF|T))$.
arXiv Detail & Related papers (2021-02-15T19:10:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.