Related papers: Top Two Algorithms Revisited

Top Two Algorithms Revisited

URL: http://arxiv.org/abs/2206.05979v1
Date: Mon, 13 Jun 2022 09:03:24 GMT
Title: Top Two Algorithms Revisited
Authors: Marc Jourdan, R\'emy Degenne, Dorian Baudry, Rianne de Heide and Emilie Kaufmann
Abstract summary: Top Two algorithms arose as an adaptation of Thompson sampling to best arm identification in multi-armed bandit models. We provide a general analysis of Top Two methods, which identifies desirable properties of the leader, the challenger, and the (possibly non-parametric) distributions of the arms. Our proof method demonstrates in particular that the sampling step used to select the leader inherited from Thompson sampling can be replaced by other choices.
Score: 14.783452541904365
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Top Two algorithms arose as an adaptation of Thompson sampling to best arm identification in multi-armed bandit models (Russo, 2016), for parametric families of arms. They select the next arm to sample from by randomizing among two candidate arms, a leader and a challenger. Despite their good empirical performance, theoretical guarantees for fixed-confidence best arm identification have only been obtained when the arms are Gaussian with known variances. In this paper, we provide a general analysis of Top Two methods, which identifies desirable properties of the leader, the challenger, and the (possibly non-parametric) distributions of the arms. As a result, we obtain theoretically supported Top Two algorithms for best arm identification with bounded distributions. Our proof method demonstrates in particular that the sampling step used to select the leader inherited from Thompson sampling can be replaced by other choices, like selecting the empirical best arm.

Related papers

An Algorithm for Fixed Budget Best Arm Identification with Combinatorial Exploration [3.9901365062418312]
We consider the best arm identification problem in the $K-$armed bandit framework. Agent is allowed to play a subset of arms at each time slot instead of one arm. We propose an algorithm that constructs $log K$ groups and performs a likelihood ratio test to detect the presence of the best arm.
arXiv Detail & Related papers (2025-02-03T15:10:08Z)
Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms. We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z)
Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms. In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z)
Eliciting Kemeny Rankings [6.971011179091351]
We find approximation bounds for Kemeny rankings dependant on confidence intervals over estimated winning probabilities of arms. We formulate several adaptive sampling methods that use look-aheads to estimate how much confidence intervals might be tightened.
arXiv Detail & Related papers (2023-12-18T19:14:42Z)
Worst-Case Optimal Multi-Armed Gaussian Best Arm Identification with a Fixed Budget [10.470114319701576]
This study investigates the experimental design problem for identifying the arm with the highest expected outcome. Under the assumption that the variances are known, we propose the Generalized-Neyman-Allocation (GNA)-empirical-best-arm (EBA) strategy. We show that the GNA-EBA strategy is infinitelyally optimal in sense that its probability of misidentification aligns with the lower bounds.
arXiv Detail & Related papers (2023-10-30T17:52:46Z)
Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward. We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy. We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z)
Covariance Adaptive Best Arm Identification [0.0]
The goal is to identify the arm with the highest mean reward with a probability of at least 1 -- $delta$, while minimizing the number of arm pulls. We propose a more flexible scenario where arms can be dependent and rewards can be sampled simultaneously. This framework is relevant in various applications, such as clinical trials, where similarities between patients or drugs suggest underlying correlations.
arXiv Detail & Related papers (2023-06-05T06:57:09Z)
Optimal Fixed-Budget Best Arm Identification using the Augmented Inverse Probability Estimator in Two-Armed Gaussian Bandits with Unknown Variances [27.122181278234617]
We consider the fixed-budget best arm identification problem in two-armed Gaussian bandits with unknown variances. We propose a strategy comprising a sampling rule with randomized sampling (RS) following the estimated target allocation probabilities of arm draws. We show that the proposed strategy is agnostically optimal when the sample size becomes infinitely large and the gap between the two arms goes to zero.
arXiv Detail & Related papers (2022-01-12T13:38:33Z)
Algorithms for Adaptive Experiments that Trade-off Statistical Analysis with Reward: Combining Uniform Random Assignment and Reward Maximization [50.725191156128645]
Multi-armed bandit algorithms like Thompson Sampling can be used to conduct adaptive experiments. We present simulations for 2-arm experiments that explore two algorithms that combine the benefits of uniform randomization for statistical analysis.
arXiv Detail & Related papers (2021-12-15T22:11:58Z)
Mean-based Best Arm Identification in Stochastic Bandits under Reward Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits. Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z)
Quantile Bandits for Best Arms Identification [10.294977861990203]
We consider a variant of the best arm identification task in multi-armed bandits. Motivated by risk-averse decision-making problems, our goal is to identify a set of $m$ arms with the highest $tau$-quantile values within a fixed budget.
arXiv Detail & Related papers (2020-10-22T09:58:54Z)
Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds. Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.