Worst-Case Optimal Multi-Armed Gaussian Best Arm Identification with a
Fixed Budget
- URL: http://arxiv.org/abs/2310.19788v3
- Date: Mon, 11 Mar 2024 00:56:02 GMT
- Title: Worst-Case Optimal Multi-Armed Gaussian Best Arm Identification with a
Fixed Budget
- Authors: Masahiro Kato
- Abstract summary: This study investigates the experimental design problem for identifying the arm with the highest expected outcome.
Under the assumption that the variances are known, we propose the Generalized-Neyman-Allocation (GNA)-empirical-best-arm (EBA) strategy.
We show that the GNA-EBA strategy is infinitelyally optimal in sense that its probability of misidentification aligns with the lower bounds.
- Score: 10.470114319701576
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This study investigates the experimental design problem for identifying the
arm with the highest expected outcome, referred to as best arm identification
(BAI). In our experiments, the number of treatment-allocation rounds is fixed.
During each round, a decision-maker allocates an arm and observes a
corresponding outcome, which follows a Gaussian distribution with variances
that can differ among the arms. At the end of the experiment, the
decision-maker recommends one of the arms as an estimate of the best arm. To
design an experiment, we first discuss lower bounds for the probability of
misidentification. Our analysis highlights that the available information on
the outcome distribution, such as means (expected outcomes), variances, and the
choice of the best arm, significantly influences the lower bounds. Because
available information is limited in actual experiments, we develop a lower
bound that is valid under the unknown means and the unknown choice of the best
arm, which are referred to as the worst-case lower bound. We demonstrate that
the worst-case lower bound depends solely on the variances of the outcomes.
Then, under the assumption that the variances are known, we propose the
Generalized-Neyman-Allocation (GNA)-empirical-best-arm (EBA) strategy, an
extension of the Neyman allocation proposed by Neyman (1934). We show that the
GNA-EBA strategy is asymptotically optimal in the sense that its probability of
misidentification aligns with the lower bounds as the sample size increases
infinitely and the differences between the expected outcomes of the best and
other suboptimal arms converge to the same values across arms. We refer to such
strategies as asymptotically worst-case optimal.
Related papers
- Locally Optimal Fixed-Budget Best Arm Identification in Two-Armed Gaussian Bandits with Unknown Variances [10.470114319701576]
We propose a strategy that estimates variances during an adaptive experiment and draws arms with a ratio of the estimated standard deviations.
Our results suggest that under the worst-case scenario characterized by the small-gap regime, our strategy, which employs estimated variance, is optimalally even when the variances are unknown.
arXiv Detail & Related papers (2023-12-20T03:28:49Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Optimal Best Arm Identification with Fixed Confidence in Restless Bandits [66.700654953613]
We study best arm identification in a restless multi-armed bandit setting with finitely many arms.
The discrete-time data generated by each arm forms a homogeneous Markov chain taking values in a common, finite state space.
It is demonstrated that tracking the long-term behavior of a certain Markov decision process and its state-action visitation proportions are the key ingredients in analyzing the converse and achievability bounds.
arXiv Detail & Related papers (2023-10-20T10:04:05Z) - Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward.
We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy.
We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z) - Covariance Adaptive Best Arm Identification [0.0]
The goal is to identify the arm with the highest mean reward with a probability of at least 1 -- $delta$, while minimizing the number of arm pulls.
We propose a more flexible scenario where arms can be dependent and rewards can be sampled simultaneously.
This framework is relevant in various applications, such as clinical trials, where similarities between patients or drugs suggest underlying correlations.
arXiv Detail & Related papers (2023-06-05T06:57:09Z) - Best Arm Identification in Bandits with Limited Precision Sampling [14.011731120150124]
We study best arm identification in a variant of the multi-armed bandit problem where the learner has limited precision in arm selection.
We propose a modified tracking-based algorithm to handle non-unique optimal allocations.
arXiv Detail & Related papers (2023-05-10T12:07:48Z) - Asymptotically Optimal Fixed-Budget Best Arm Identification with
Variance-Dependent Bounds [10.915684166086026]
We investigate the problem of fixed-budget best arm identification (BAI) for minimizing expected simple regret.
We evaluate the decision based on the expected simple regret, which is the difference between the expected outcomes of the best arm and the recommended arm.
We propose the Two-Stage (TS)-Hirano-Imbens-Ridder (HIR) strategy, which utilizes the HIR estimator (Hirano et al., 2003) in recommending the best arm.
arXiv Detail & Related papers (2023-02-06T18:27:11Z) - Best Arm Identification in Restless Markov Multi-Armed Bandits [85.55466536537293]
We study the problem of identifying the best arm in a multi-armed bandit environment.
A decision entity wishes to find the index of the best arm as quickly as possible, subject to an upper bound error probability.
We show that this policy achieves an upper bound that depends on $R$ and is monotonically non-increasing as $Rtoinfty$.
arXiv Detail & Related papers (2022-03-29T04:58:04Z) - Optimal Fixed-Budget Best Arm Identification using the Augmented Inverse
Probability Estimator in Two-Armed Gaussian Bandits with Unknown Variances [27.122181278234617]
We consider the fixed-budget best arm identification problem in two-armed Gaussian bandits with unknown variances.
We propose a strategy comprising a sampling rule with randomized sampling (RS) following the estimated target allocation probabilities of arm draws.
We show that the proposed strategy is agnostically optimal when the sample size becomes infinitely large and the gap between the two arms goes to zero.
arXiv Detail & Related papers (2022-01-12T13:38:33Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - Towards Minimax Optimal Best Arm Identification in Linear Bandits [95.22854522340938]
We study the problem of best arm identification in linear bandits in the fixed-budget setting.
By leveraging properties of the G-optimal design and incorporating it into the arm allocation rule, we design a parameter-free algorithm.
We provide a theoretical analysis of the failure probability of OD-LinBAI.
arXiv Detail & Related papers (2021-05-27T09:19:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.