Related papers: Closing the Gap on the Sample Complexity of 1-Identification

Closing the Gap on the Sample Complexity of 1-Identification

URL: http://arxiv.org/abs/2601.15620v1
Date: Thu, 22 Jan 2026 03:50:31 GMT
Title: Closing the Gap on the Sample Complexity of 1-Identification
Authors: Zitian Li, Wang Chi Cheung,
Abstract summary: 1-identification is a fundamental multi-armed bandit formulation on pure exploration.<n>We derive a new lower bound of $mathbbE$ when there is at least one qualified arm.<n>Our result complements the analysis of $mathbbE$ when there are multiple qualified arms.
Score: 8.450904497835262
License: http://creativecommons.org/licenses/by/4.0/
Abstract: 1-identification is a fundamental multi-armed bandit formulation on pure exploration. An agent aims to determine whether there exists a qualified arm whose mean reward is not less than a known threshold $μ_0$, or to output \textsf{None} if it believes such an arm does not exist. The agent needs to guarantee its output is correct with probability at least $1-δ$, while making expected total pulling times $\mathbb{E}τ$ as small as possible. We work on 1-identification with two main contributions. (1) We utilize an optimization formulation to derive a new lower bound of $\mathbb{E}τ$, when there is at least one qualified arm. (2) We design a new algorithm, deriving tight upper bounds whose gap to lower bounds are up to a polynomial of logarithm factor across all problem instance. Our result complements the analysis of $\mathbb{E}τ$ when there are multiple qualified arms, which is an open problem left by history literature.

Related papers

Multiple-play Stochastic Bandits with Prioritized Arm Capacity Sharing [52.124267908936396]
The model is composed of $M$ arms and $K$ plays.<n>Each arm has a number of capacities, and each unit of capacity is associated with a reward function.<n>When multiple plays compete for the arm capacity, the arm capacity is allocated in a larger priority weight first manner.
arXiv Detail & Related papers (2025-12-25T11:19:09Z)
Near Optimal Non-asymptotic Sample Complexity of 1-Identification [5.279257531335345]
We study the 1-identification problem, a fundamental multi-armed bandit formulation on pure exploration.<n>The goal is to determine whether there exists an arm whose mean reward is at least a known threshold $mu_0$, or to output None if it believes such an arm does not exist.<n>We design a new algorithm, and conduct theoretical analysis from the non-asymptotic perspective.
arXiv Detail & Related papers (2025-06-08T03:23:45Z)
Online Fair Division for Personalized $2$-Value Instances [51.278096593080456]
We study an online fair division setting, where goods arrive one at a time and there is a fixed set of $n$ agents.<n>Once a good appears, the value each agent has for it is revealed and it must be allocated immediately and irrevocably to one of the agents.<n>We show how to obtain worst case guarantees with respect to well-known fairness notions.
arXiv Detail & Related papers (2025-05-28T09:48:16Z)
Multi-thresholding Good Arm Identification with Bandit Feedback [1.079960007119637]
We consider a good arm identification problem in a bandit setting with multi-objectives.<n>We propose the Multi-Thresholding UCB(MultiTUCB) algorithm with a sample complexity bound.
arXiv Detail & Related papers (2025-03-13T14:04:04Z)
Heterogeneous Multi-Agent Bandits with Parsimonious Hints [12.709437751500353]
We study a hinted heterogeneous multi-agent multi-armed bandits problem (HMA2B), where agents can query low-cost observations (hints) in addition to pulling arms.<n>In this framework, each of the $M$ agents has a unique reward distribution over $K$ arms, and in $T$ rounds, they can observe the reward of the arm they pull only if no other agent pulls that arm.<n>The goal is to maximize the total utility by querying the minimal necessary hints without pulling arms, achieving time-independent regret.
arXiv Detail & Related papers (2025-02-22T07:46:41Z)
Towards a Sharp Analysis of Offline Policy Learning for $f$-Divergence-Regularized Contextual Bandits [49.96531901205305]
We analyze $f$-divergence-regularized offline policy learning.<n>For reverse Kullback-Leibler (KL) divergence, we give the first $tildeO(epsilon-1)$ sample complexity under single-policy concentrability.<n>We extend our analysis to dueling bandits, and we believe these results take a significant step toward a comprehensive understanding of $f$-divergence-regularized policy learning.
arXiv Detail & Related papers (2025-02-09T22:14:45Z)
Optimal Multi-Objective Best Arm Identification with Fixed Confidence [62.36929749450298]
We consider a multi-armed bandit setting in which each arm yields an $M$-dimensional vector reward upon selection.<n>The end goal is to identify the best arm of em every objective in the shortest (expected) time subject to an upper bound on the probability of error.<n>We propose an algorithm that uses the novel idea of em surrogate proportions to sample the arms at each time step, eliminating the need to solve the max-min optimisation problem at each step.
arXiv Detail & Related papers (2025-01-23T12:28:09Z)
A General Framework for Clustering and Distribution Matching with Bandit Feedback [81.50716021326194]
We develop a general framework for clustering and distribution matching problems with bandit feedback.<n>We derive a non-asymptotic lower bound on the average number of arm pulls for any online algorithm with an error probability not exceeding $delta$.<n>Our refined analysis uncovers a novel bound on the speed at which the average number of arm pulls of our algorithm converges to the fundamental limit as $delta$ vanishes.
arXiv Detail & Related papers (2024-09-08T12:19:12Z)
Optimal Top-Two Method for Best Arm Identification and Fluid Analysis [15.353009236788262]
We propose an optimal top-2 type algorithm for the best arm identification problem.<n>We show that the proposed algorithm is optimal as $delta rightarrow 0$.
arXiv Detail & Related papers (2024-03-14T06:14:07Z)
Online Learning with Adversaries: A Differential-Inclusion Analysis [52.43460995467893]
We introduce an observation-matrix-based framework for fully asynchronous online Federated Learning with adversaries. Our main result is that the proposed algorithm almost surely converges to the desired mean $mu.$ We derive this convergence using a novel differential-inclusion-based two-timescale analysis.
arXiv Detail & Related papers (2023-04-04T04:32:29Z)
Max-Quantile Grouped Infinite-Arm Bandits [39.7239093424124]
bandit problem in which there are a number of groups each consisting of infinitely many arms. We introduce a two-step algorithm that first requests a fixed number of arms from each group and then runs a finite-arm grouped max-quantile bandit algorithm. We characterize both the instance-dependent and worst-case regret, and provide a matching lower bound for the latter.
arXiv Detail & Related papers (2022-10-04T00:46:49Z)
Best-Arm Identification in Correlated Multi-Armed Bandits [9.180690233707645]
We propose a novel correlated bandit framework that captures domain knowledge about correlation between arms. We show that the total samples obtained by C-LUCB are of the form $mathcalOleft(sum_k in mathcalC logleft(frac1deltaright)$ as opposed to the typical $mathcalOleft(sum_k in mathcalC logleft(frac1deltaright)$ samples required in the independent reward setting
arXiv Detail & Related papers (2021-09-10T15:41:07Z)
Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy [71.17938026619068]
We study the top-$k$ extreme contextual bandits problem, where the total number of arms can be enormous. We first propose an algorithm for the non-extreme realizable setting, utilizing the Inverse Gap Weighting strategy. We show that our algorithm has a regret guarantee of $O(ksqrt(A-k+1)T log (|mathcalF|T))$.
arXiv Detail & Related papers (2021-02-15T19:10:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.