Minimax and Bayes Optimal Best-Arm Identification
- URL: http://arxiv.org/abs/2506.24007v3
- Date: Wed, 01 Oct 2025 17:11:29 GMT
- Title: Minimax and Bayes Optimal Best-Arm Identification
- Authors: Masahiro Kato,
- Abstract summary: We consider an adaptive procedure consisting of a sampling phase followed by a recommendation phase.<n>In our proposed strategy, the sampling phase consists of two stages. The first stage is a pilot phase, in which we allocate each arm uniformly in equal proportions.<n>After the sampling phase, the procedure enters the recommendation phase, where we select the arm with the highest sample mean as our estimate of the best arm.
- Score: 6.44705221140412
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This study investigates minimax and Bayes optimal strategies in fixed-budget best-arm identification. We consider an adaptive procedure consisting of a sampling phase followed by a recommendation phase, and we design an adaptive experiment within this framework to efficiently identify the best arm, defined as the one with the highest expected outcome. In our proposed strategy, the sampling phase consists of two stages. The first stage is a pilot phase, in which we allocate each arm uniformly in equal proportions to eliminate clearly suboptimal arms and estimate outcome variances. In the second stage, arms are allocated in proportion to the variances estimated during the first stage. After the sampling phase, the procedure enters the recommendation phase, where we select the arm with the highest sample mean as our estimate of the best arm. We prove that this single strategy is simultaneously asymptotically minimax and Bayes optimal for the simple regret, with upper bounds that coincide exactly with our lower bounds, including the constant terms.
Related papers
- Orders matter: tight bounds on the precision of sequential quantum estimation for multiparameter models [0.9379969114114787]
In quantum metrology, the ultimate precision of joint estimation is dictated by the Holevo Cram'er-Rao bound.<n>In this paper, we discuss and analyze in detail an alternative approach: the stepwise estimation strategy.<n>We derive a tight and achievable precision bound for this protocol, the stepwise separable bound, and provide its closed-form analytical expression.
arXiv Detail & Related papers (2025-10-16T17:59:15Z) - Admissibility of Completely Randomized Trials: A Large-Deviation Approach [4.970364068620608]
We find that whenever there are at least three treatment arms, there exist simple adaptive designs that universally and strictly dominate non-adaptive completely randomized trials.<n>This dominance is characterized by a notion called efficiency exponent, which quantifies a design's statistical efficiency when the experimental sample is large.
arXiv Detail & Related papers (2025-06-05T17:58:43Z) - Towards Regulatory-Confirmed Adaptive Clinical Trials: Machine Learning Opportunities and Solutions [59.28853595868749]
We introduce two new objectives for future clinical trials that integrate regulatory constraints and treatment policy value for both the entire population and under-served populations.<n>We formulate Randomize First Augment Next (RFAN), a new framework for designing Phase III clinical trials.<n>Our framework consists of a standard randomized component followed by an adaptive one, jointly meant to efficiently and safely acquire and assign patients into treatment arms during the trial.
arXiv Detail & Related papers (2025-03-12T10:17:54Z) - Minimax Optimal Simple Regret in Two-Armed Best-Arm Identification [10.470114319701576]
We prove the minimax optimality of the Neyman allocation for the simple regret.<n>We show that our optimality result holds without imposing locality restrictions on the local normality.
arXiv Detail & Related papers (2024-12-23T18:06:20Z) - Reward Maximization for Pure Exploration: Minimax Optimal Good Arm Identification for Nonparametric Multi-Armed Bandits [35.35226227009685]
Good arm identification (IGA) is a practical bandit inference objective that aims to label arms with means above a threshold as quickly as possible.
We show that GA can be efficiently solved by combining a reward-maximizing sampling algorithm with a novel non-valid sequential test for labeling arm means.
Our empirical results validate our approach beyond the minimax setting, reducing the expected number of samples for all stopping times by at least 50% across both synthetic and real-world settings.
arXiv Detail & Related papers (2024-10-21T01:19:23Z) - Optimal Adaptive Experimental Design for Estimating Treatment Effect [14.088972921434761]
This paper addresses the fundamental question of determining the optimal accuracy in estimating the treatment effect.
By incorporating the concept of doubly robust method into sequential experimental design, we frame the optimal estimation problem as an online bandit learning problem.
Using tools and ideas from both bandit algorithm design and adaptive statistical estimation, we propose a general low switching adaptive experiment framework.
arXiv Detail & Related papers (2024-10-07T23:22:51Z) - Best Arm Identification with Minimal Regret [55.831935724659175]
Best arm identification problem elegantly amalgamates regret minimization and BAI.
Agent's goal is to identify the best arm with a prescribed confidence level.
Double KL-UCB algorithm achieves optimality as the confidence level tends to zero.
arXiv Detail & Related papers (2024-09-27T16:46:02Z) - Adaptive Experimentation When You Can't Experiment [55.86593195947978]
This paper introduces the emphconfounded pure exploration transductive linear bandit (textttCPET-LB) problem.
Online services can employ a properly randomized encouragement that incentivizes users toward a specific treatment.
arXiv Detail & Related papers (2024-06-15T20:54:48Z) - Optimal Multi-Fidelity Best-Arm Identification [65.23078799972188]
In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible.<n>We study multi-fidelity best-arm identification, in which the can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost.<n>Several methods have been proposed for tackling this problem, but their optimality remain elusive, notably due to loose lower bounds on the total cost needed to identify the best arm.
arXiv Detail & Related papers (2024-06-05T08:02:40Z) - Adaptive Experimental Design for Policy Learning [8.73717644648873]
We consider a decision-maker who assigns treatment arms to experimental units during an experiment and recommends the estimated best treatment arm based on the contexts at the end of the experiment.<n>We focus on the worst-case expected regret, a measure between the expected outcomes of an optimal policy and our proposed policy.<n>We prove that this strategy is minimax rate-optimal in the sense that its leading factor in the regret upper bound matches the lower bound as the number of experimental units increases.
arXiv Detail & Related papers (2024-01-08T09:29:07Z) - Worst-Case Optimal Multi-Armed Gaussian Best Arm Identification with a
Fixed Budget [10.470114319701576]
This study investigates the experimental design problem for identifying the arm with the highest expected outcome.
Under the assumption that the variances are known, we propose the Generalized-Neyman-Allocation (GNA)-empirical-best-arm (EBA) strategy.
We show that the GNA-EBA strategy is infinitelyally optimal in sense that its probability of misidentification aligns with the lower bounds.
arXiv Detail & Related papers (2023-10-30T17:52:46Z) - Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward.
We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy.
We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z) - Asymptotically Optimal Fixed-Budget Best Arm Identification with
Variance-Dependent Bounds [10.915684166086026]
We investigate the problem of fixed-budget best arm identification (BAI) for minimizing expected simple regret.
We evaluate the decision based on the expected simple regret, which is the difference between the expected outcomes of the best arm and the recommended arm.
We propose the Two-Stage (TS)-Hirano-Imbens-Ridder (HIR) strategy, which utilizes the HIR estimator (Hirano et al., 2003) in recommending the best arm.
arXiv Detail & Related papers (2023-02-06T18:27:11Z) - TCFimt: Temporal Counterfactual Forecasting from Individual Multiple
Treatment Perspective [50.675845725806724]
We propose a comprehensive framework of temporal counterfactual forecasting from an individual multiple treatment perspective (TCFimt)
TCFimt constructs adversarial tasks in a seq2seq framework to alleviate selection and time-varying bias and designs a contrastive learning-based block to decouple a mixed treatment effect into separated main treatment effects and causal interactions.
The proposed method shows satisfactory performance in predicting future outcomes with specific treatments and in choosing optimal treatment type and timing than state-of-the-art methods.
arXiv Detail & Related papers (2022-12-17T15:01:05Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - Variational Refinement for Importance Sampling Using the Forward
Kullback-Leibler Divergence [77.06203118175335]
Variational Inference (VI) is a popular alternative to exact sampling in Bayesian inference.
Importance sampling (IS) is often used to fine-tune and de-bias the estimates of approximate Bayesian inference procedures.
We propose a novel combination of optimization and sampling techniques for approximate Bayesian inference.
arXiv Detail & Related papers (2021-06-30T11:00:24Z) - Bayesian prognostic covariate adjustment [59.75318183140857]
Historical data about disease outcomes can be integrated into the analysis of clinical trials in many ways.
We build on existing literature that uses prognostic scores from a predictive model to increase the efficiency of treatment effect estimates.
arXiv Detail & Related papers (2020-12-24T05:19:03Z) - Assisted Probe Positioning for Ultrasound Guided Radiotherapy Using
Image Sequence Classification [55.96221340756895]
Effective transperineal ultrasound image guidance in prostate external beam radiotherapy requires consistent alignment between probe and prostate at each session during patient set-up.
We demonstrate a method for ensuring accurate probe placement through joint classification of images and probe position data.
Using a multi-input multi-task algorithm, spatial coordinate data from an optically tracked ultrasound probe is combined with an image clas-sifier using a recurrent neural network to generate two sets of predictions in real-time.
The algorithm identified optimal probe alignment within a mean (standard deviation) range of 3.7$circ$ (1.2$circ$) from
arXiv Detail & Related papers (2020-10-06T13:55:02Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z) - Optimal Experimental Design for Staggered Rollouts [11.187415608299075]
We study the design and analysis of experiments conducted on a set of units over multiple time periods where the starting time of the treatment may vary by unit.
We propose a new algorithm, the Precision-Guided Adaptive Experiment (PGAE) algorithm, that addresses the challenges at both the design stage and at the stage of estimating treatment effects.
arXiv Detail & Related papers (2019-11-09T19:46:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.