Minimax Optimal Simple Regret in Two-Armed Best-Arm Identification
- URL: http://arxiv.org/abs/2412.17753v2
- Date: Tue, 21 Jan 2025 19:44:34 GMT
- Title: Minimax Optimal Simple Regret in Two-Armed Best-Arm Identification
- Authors: Masahiro Kato,
- Abstract summary: We prove the minimax optimality of the Neyman allocation for the simple regret.
We show that our optimality result holds without imposing locality restrictions on the local normality.
- Score: 10.470114319701576
- License:
- Abstract: This study investigates an asymptotically minimax optimal algorithm in the two-armed fixed-budget best-arm identification (BAI) problem. Given two treatment arms, the objective is to identify the arm with the highest expected outcome through an adaptive experiment. We focus on the Neyman allocation, where treatment arms are allocated following the ratio of their outcome standard deviations. Our primary contribution is to prove the minimax optimality of the Neyman allocation for the simple regret, defined as the difference between the expected outcomes of the true best arm and the estimated best arm. Specifically, we first derive a minimax lower bound for the expected simple regret, which characterizes the worst-case performance achievable under the location-shift distributions, including Gaussian distributions. We then show that the simple regret of the Neyman allocation asymptotically matches this lower bound, including the constant term, not just the rate in terms of the sample size, under the worst-case distribution. Notably, our optimality result holds without imposing locality restrictions on the distribution, such as the local asymptotic normality. Furthermore, we demonstrate that the Neyman allocation reduces to the uniform allocation, i.e., the standard randomized controlled trial, under Bernoulli distributions.
Related papers
- Best Arm Identification with Minimal Regret [55.831935724659175]
Best arm identification problem elegantly amalgamates regret minimization and BAI.
Agent's goal is to identify the best arm with a prescribed confidence level.
Double KL-UCB algorithm achieves optimality as the confidence level tends to zero.
arXiv Detail & Related papers (2024-09-27T16:46:02Z) - Generalized Neyman Allocation for Locally Minimax Optimal Best-Arm Identification [10.470114319701576]
This study investigates anally locally minimax optimal algorithm for fixed-budget best-arm identification (BAI)
We propose the Generalized Neyman Allocation (GNA) algorithm and demonstrate that its worst-case upper bound on the probability of misidentifying the best arm aligns with the worst-case lower bound under the small-gap regime.
Our lower and upper bounds are tight, matching exactly including constant terms within the small-gap regime.
arXiv Detail & Related papers (2024-05-29T17:43:13Z) - Locally Optimal Fixed-Budget Best Arm Identification in Two-Armed Gaussian Bandits with Unknown Variances [10.470114319701576]
We propose a strategy that estimates variances during an adaptive experiment and draws arms with a ratio of the estimated standard deviations.
Our results suggest that under the worst-case scenario characterized by the small-gap regime, our strategy, which employs estimated variance, is optimalally even when the variances are unknown.
arXiv Detail & Related papers (2023-12-20T03:28:49Z) - Worst-Case Optimal Multi-Armed Gaussian Best Arm Identification with a
Fixed Budget [10.470114319701576]
This study investigates the experimental design problem for identifying the arm with the highest expected outcome.
Under the assumption that the variances are known, we propose the Generalized-Neyman-Allocation (GNA)-empirical-best-arm (EBA) strategy.
We show that the GNA-EBA strategy is infinitelyally optimal in sense that its probability of misidentification aligns with the lower bounds.
arXiv Detail & Related papers (2023-10-30T17:52:46Z) - Best Arm Identification in Bandits with Limited Precision Sampling [14.011731120150124]
We study best arm identification in a variant of the multi-armed bandit problem where the learner has limited precision in arm selection.
We propose a modified tracking-based algorithm to handle non-unique optimal allocations.
arXiv Detail & Related papers (2023-05-10T12:07:48Z) - Asymptotically Optimal Fixed-Budget Best Arm Identification with
Variance-Dependent Bounds [10.915684166086026]
We investigate the problem of fixed-budget best arm identification (BAI) for minimizing expected simple regret.
We evaluate the decision based on the expected simple regret, which is the difference between the expected outcomes of the best arm and the recommended arm.
We propose the Two-Stage (TS)-Hirano-Imbens-Ridder (HIR) strategy, which utilizes the HIR estimator (Hirano et al., 2003) in recommending the best arm.
arXiv Detail & Related papers (2023-02-06T18:27:11Z) - Finite-Time Regret of Thompson Sampling Algorithms for Exponential
Family Multi-Armed Bandits [88.21288104408556]
We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family.
We propose a Thompson sampling, termed Expulli, which uses a novel sampling distribution to avoid the under-estimation of the optimal arm.
arXiv Detail & Related papers (2022-06-07T18:08:21Z) - A Dimensionality Reduction Method for Finding Least Favorable Priors
with a Focus on Bregman Divergence [108.28566246421742]
This paper develops a dimensionality reduction method that allows us to move the optimization to a finite-dimensional setting with an explicit bound on the dimension.
In order to make progress on the problem, we restrict ourselves to Bayesian risks induced by a relatively large class of loss functions, namely Bregman divergences.
arXiv Detail & Related papers (2022-02-23T16:22:28Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - Robust Learning of Optimal Auctions [84.13356290199603]
We study the problem of learning revenue-optimal multi-bidder auctions from samples when the samples of bidders' valuations can be adversarially corrupted or drawn from distributions that are adversarially perturbed.
We propose new algorithms that can learn a mechanism whose revenue is nearly optimal simultaneously for all true distributions'' that are $alpha$-close to the original distribution in Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2021-07-13T17:37:21Z) - Distributionally Robust Bayesian Quadrature Optimization [60.383252534861136]
We study BQO under distributional uncertainty in which the underlying probability distribution is unknown except for a limited set of its i.i.d. samples.
A standard BQO approach maximizes the Monte Carlo estimate of the true expected objective given the fixed sample set.
We propose a novel posterior sampling based algorithm, namely distributionally robust BQO (DRBQO) for this purpose.
arXiv Detail & Related papers (2020-01-19T12:00:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.