Optimal Fixed-Budget Best Arm Identification using the Augmented Inverse
Probability Estimator in Two-Armed Gaussian Bandits with Unknown Variances
- URL: http://arxiv.org/abs/2201.04469v2
- Date: Thu, 13 Jan 2022 03:48:26 GMT
- Title: Optimal Fixed-Budget Best Arm Identification using the Augmented Inverse
Probability Estimator in Two-Armed Gaussian Bandits with Unknown Variances
- Authors: Masahiro Kato and Kaito Ariu and Masaaki Imaizumi and Masatoshi Uehara
and Masahiro Nomura and Chao Qin
- Abstract summary: We consider the fixed-budget best arm identification problem in two-armed Gaussian bandits with unknown variances.
We propose a strategy comprising a sampling rule with randomized sampling (RS) following the estimated target allocation probabilities of arm draws.
We show that the proposed strategy is agnostically optimal when the sample size becomes infinitely large and the gap between the two arms goes to zero.
- Score: 27.122181278234617
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the fixed-budget best arm identification problem in two-armed
Gaussian bandits with unknown variances. The tightest lower bound on the
complexity and an algorithm whose performance guarantee matches the lower bound
have long been open problems when the variances are unknown and when the
algorithm is agnostic to the optimal proportion of the arm draws. In this
paper, we propose a strategy comprising a sampling rule with randomized
sampling (RS) following the estimated target allocation probabilities of arm
draws and a recommendation rule using the augmented inverse probability
weighting (AIPW) estimator, which is often used in the causal inference
literature. We refer to our strategy as the RS-AIPW strategy. In the
theoretical analysis, we first derive a large deviation principle for
martingales, which can be used when the second moment converges in mean, and
apply it to our proposed strategy. Then, we show that the proposed strategy is
asymptotically optimal in the sense that the probability of misidentification
achieves the lower bound by Kaufmann et al. (2016) when the sample size becomes
infinitely large and the gap between the two arms goes to zero.
Related papers
- Locally Optimal Fixed-Budget Best Arm Identification in Two-Armed Gaussian Bandits with Unknown Variances [10.470114319701576]
We propose a strategy that estimates variances during an adaptive experiment and draws arms with a ratio of the estimated standard deviations.
Our results suggest that under the worst-case scenario characterized by the small-gap regime, our strategy, which employs estimated variance, is optimalally even when the variances are unknown.
arXiv Detail & Related papers (2023-12-20T03:28:49Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Semiparametric Best Arm Identification with Contextual Information [10.915684166086026]
We study best-arm identification with a fixed budget and contextual information in bandit problems.
We develop the "Contextual RS-AIPW strategy," which consists of the random sampling rule tracking a target allocation ratio and the recommendation rule.
Our proposed Contextual RS-AIPW strategy is optimal because the upper bound for the probability of misidentification matches the semi lower bound when the budget goes to infinity.
arXiv Detail & Related papers (2022-09-15T14:38:47Z) - Algorithms for Adaptive Experiments that Trade-off Statistical Analysis
with Reward: Combining Uniform Random Assignment and Reward Maximization [50.725191156128645]
Multi-armed bandit algorithms like Thompson Sampling can be used to conduct adaptive experiments.
We present simulations for 2-arm experiments that explore two algorithms that combine the benefits of uniform randomization for statistical analysis.
arXiv Detail & Related papers (2021-12-15T22:11:58Z) - From Optimality to Robustness: Dirichlet Sampling Strategies in
Stochastic Bandits [0.0]
We study a generic Dirichlet Sampling (DS) algorithm, based on pairwise comparisons of empirical indices computed with re-sampling of the arms' observations.
We show that different variants of this strategy achieve provably optimal regret guarantees when the distributions are bounded and logarithmic regret for semi-bounded distributions with a mild quantile condition.
arXiv Detail & Related papers (2021-11-18T14:34:21Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z) - A Non-asymptotic Approach to Best-Arm Identification for Gaussian
Bandits [0.0]
We propose a new strategy for best-arm identification with fixed confidence of Gaussian variables with bounded means and unit variance.
This strategy called Exploration-Biased Sampling is not onlyally optimal: we also prove non-asymptotic bounds occurring with high probability.
arXiv Detail & Related papers (2021-05-27T07:42:49Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z) - Thompson Sampling Algorithms for Mean-Variance Bandits [97.43678751629189]
We develop Thompson Sampling-style algorithms for mean-variance MAB.
We also provide comprehensive regret analyses for Gaussian and Bernoulli bandits.
Our algorithms significantly outperform existing LCB-based algorithms for all risk tolerances.
arXiv Detail & Related papers (2020-02-01T15:33:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.