Top Two Algorithms Revisited
- URL: http://arxiv.org/abs/2206.05979v1
- Date: Mon, 13 Jun 2022 09:03:24 GMT
- Title: Top Two Algorithms Revisited
- Authors: Marc Jourdan, R\'emy Degenne, Dorian Baudry, Rianne de Heide and
Emilie Kaufmann
- Abstract summary: Top Two algorithms arose as an adaptation of Thompson sampling to best arm identification in multi-armed bandit models.
We provide a general analysis of Top Two methods, which identifies desirable properties of the leader, the challenger, and the (possibly non-parametric) distributions of the arms.
Our proof method demonstrates in particular that the sampling step used to select the leader inherited from Thompson sampling can be replaced by other choices.
- Score: 14.783452541904365
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Top Two algorithms arose as an adaptation of Thompson sampling to best arm
identification in multi-armed bandit models (Russo, 2016), for parametric
families of arms. They select the next arm to sample from by randomizing among
two candidate arms, a leader and a challenger. Despite their good empirical
performance, theoretical guarantees for fixed-confidence best arm
identification have only been obtained when the arms are Gaussian with known
variances. In this paper, we provide a general analysis of Top Two methods,
which identifies desirable properties of the leader, the challenger, and the
(possibly non-parametric) distributions of the arms. As a result, we obtain
theoretically supported Top Two algorithms for best arm identification with
bounded distributions. Our proof method demonstrates in particular that the
sampling step used to select the leader inherited from Thompson sampling can be
replaced by other choices, like selecting the empirical best arm.
Related papers
- Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms.
We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Eliciting Kemeny Rankings [6.971011179091351]
We find approximation bounds for Kemeny rankings dependant on confidence intervals over estimated winning probabilities of arms.
We formulate several adaptive sampling methods that use look-aheads to estimate how much confidence intervals might be tightened.
arXiv Detail & Related papers (2023-12-18T19:14:42Z) - Worst-Case Optimal Multi-Armed Gaussian Best Arm Identification with a
Fixed Budget [10.470114319701576]
This study investigates the experimental design problem for identifying the arm with the highest expected outcome.
Under the assumption that the variances are known, we propose the Generalized-Neyman-Allocation (GNA)-empirical-best-arm (EBA) strategy.
We show that the GNA-EBA strategy is infinitelyally optimal in sense that its probability of misidentification aligns with the lower bounds.
arXiv Detail & Related papers (2023-10-30T17:52:46Z) - Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward.
We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy.
We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z) - Covariance Adaptive Best Arm Identification [0.0]
The goal is to identify the arm with the highest mean reward with a probability of at least 1 -- $delta$, while minimizing the number of arm pulls.
We propose a more flexible scenario where arms can be dependent and rewards can be sampled simultaneously.
This framework is relevant in various applications, such as clinical trials, where similarities between patients or drugs suggest underlying correlations.
arXiv Detail & Related papers (2023-06-05T06:57:09Z) - Optimal Fixed-Budget Best Arm Identification using the Augmented Inverse
Probability Estimator in Two-Armed Gaussian Bandits with Unknown Variances [27.122181278234617]
We consider the fixed-budget best arm identification problem in two-armed Gaussian bandits with unknown variances.
We propose a strategy comprising a sampling rule with randomized sampling (RS) following the estimated target allocation probabilities of arm draws.
We show that the proposed strategy is agnostically optimal when the sample size becomes infinitely large and the gap between the two arms goes to zero.
arXiv Detail & Related papers (2022-01-12T13:38:33Z) - Algorithms for Adaptive Experiments that Trade-off Statistical Analysis
with Reward: Combining Uniform Random Assignment and Reward Maximization [50.725191156128645]
Multi-armed bandit algorithms like Thompson Sampling can be used to conduct adaptive experiments.
We present simulations for 2-arm experiments that explore two algorithms that combine the benefits of uniform randomization for statistical analysis.
arXiv Detail & Related papers (2021-12-15T22:11:58Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - Quantile Bandits for Best Arms Identification [10.294977861990203]
We consider a variant of the best arm identification task in multi-armed bandits.
Motivated by risk-averse decision-making problems, our goal is to identify a set of $m$ arms with the highest $tau$-quantile values within a fixed budget.
arXiv Detail & Related papers (2020-10-22T09:58:54Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.