Asymptotically Optimal Fixed-Budget Best Arm Identification with
Variance-Dependent Bounds
- URL: http://arxiv.org/abs/2302.02988v2
- Date: Wed, 12 Jul 2023 16:06:58 GMT
- Title: Asymptotically Optimal Fixed-Budget Best Arm Identification with
Variance-Dependent Bounds
- Authors: Masahiro Kato, Masaaki Imaizumi, Takuya Ishihara, Toru Kitagawa
- Abstract summary: We investigate the problem of fixed-budget best arm identification (BAI) for minimizing expected simple regret.
We evaluate the decision based on the expected simple regret, which is the difference between the expected outcomes of the best arm and the recommended arm.
We propose the Two-Stage (TS)-Hirano-Imbens-Ridder (HIR) strategy, which utilizes the HIR estimator (Hirano et al., 2003) in recommending the best arm.
- Score: 10.915684166086026
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the problem of fixed-budget best arm identification (BAI) for
minimizing expected simple regret. In an adaptive experiment, a decision maker
draws one of multiple treatment arms based on past observations and observes
the outcome of the drawn arm. After the experiment, the decision maker
recommends the treatment arm with the highest expected outcome. We evaluate the
decision based on the expected simple regret, which is the difference between
the expected outcomes of the best arm and the recommended arm. Due to inherent
uncertainty, we evaluate the regret using the minimax criterion. First, we
derive asymptotic lower bounds for the worst-case expected simple regret, which
are characterized by the variances of potential outcomes (leading factor).
Based on the lower bounds, we propose the Two-Stage (TS)-Hirano-Imbens-Ridder
(HIR) strategy, which utilizes the HIR estimator (Hirano et al., 2003) in
recommending the best arm. Our theoretical analysis shows that the TS-HIR
strategy is asymptotically minimax optimal, meaning that the leading factor of
its worst-case expected simple regret matches our derived worst-case lower
bound. Additionally, we consider extensions of our method, such as the
asymptotic optimality for the probability of misidentification. Finally, we
validate the proposed method's effectiveness through simulations.
Related papers
- Best Arm Identification with Minimal Regret [55.831935724659175]
Best arm identification problem elegantly amalgamates regret minimization and BAI.
Agent's goal is to identify the best arm with a prescribed confidence level.
Double KL-UCB algorithm achieves optimality as the confidence level tends to zero.
arXiv Detail & Related papers (2024-09-27T16:46:02Z) - Optimal Multi-Fidelity Best-Arm Identification [65.23078799972188]
In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible.
We study multi-fidelity best-arm identification, in which the can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost.
Several methods have been proposed for tackling this problem, but their optimality remain elusive, notably due to loose lower bounds on the total cost needed to identify the best arm.
arXiv Detail & Related papers (2024-06-05T08:02:40Z) - Locally Optimal Fixed-Budget Best Arm Identification in Two-Armed Gaussian Bandits with Unknown Variances [10.470114319701576]
We propose a strategy that estimates variances during an adaptive experiment and draws arms with a ratio of the estimated standard deviations.
Our results suggest that under the worst-case scenario characterized by the small-gap regime, our strategy, which employs estimated variance, is optimalally even when the variances are unknown.
arXiv Detail & Related papers (2023-12-20T03:28:49Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Data-driven optimal stopping: A pure exploration analysis [1.0742675209112622]
Minimax optimality is verified by completing the upper bound results with matching lower bounds on the simple regret.
We investigate how our results on the simple regret transfer to the cumulative regret for a specific exploration-exploitation strategy.
arXiv Detail & Related papers (2023-12-10T13:16:01Z) - Worst-Case Optimal Multi-Armed Gaussian Best Arm Identification with a
Fixed Budget [10.470114319701576]
This study investigates the experimental design problem for identifying the arm with the highest expected outcome.
Under the assumption that the variances are known, we propose the Generalized-Neyman-Allocation (GNA)-empirical-best-arm (EBA) strategy.
We show that the GNA-EBA strategy is infinitelyally optimal in sense that its probability of misidentification aligns with the lower bounds.
arXiv Detail & Related papers (2023-10-30T17:52:46Z) - Semiparametric Best Arm Identification with Contextual Information [10.915684166086026]
We study best-arm identification with a fixed budget and contextual information in bandit problems.
We develop the "Contextual RS-AIPW strategy," which consists of the random sampling rule tracking a target allocation ratio and the recommendation rule.
Our proposed Contextual RS-AIPW strategy is optimal because the upper bound for the probability of misidentification matches the semi lower bound when the budget goes to infinity.
arXiv Detail & Related papers (2022-09-15T14:38:47Z) - Minimax Off-Policy Evaluation for Multi-Armed Bandits [58.7013651350436]
We study the problem of off-policy evaluation in the multi-armed bandit model with bounded rewards.
We develop minimax rate-optimal procedures under three settings.
arXiv Detail & Related papers (2021-01-19T18:55:29Z) - Lenient Regret for Multi-Armed Bandits [72.56064196252498]
We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and observes rewards for the actions it took.
While the majority of algorithms try to minimize the regret, i.e., the cumulative difference between the reward of the best action and the agent's action, this criterion might lead to undesirable results.
We suggest a new, more lenient, regret criterion that ignores suboptimality gaps smaller than some $epsilon$.
arXiv Detail & Related papers (2020-08-10T08:30:52Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.