Information-Directed Selection for Top-Two Algorithms
- URL: http://arxiv.org/abs/2205.12086v3
- Date: Mon, 17 Jul 2023 09:24:31 GMT
- Title: Information-Directed Selection for Top-Two Algorithms
- Authors: Wei You, Chao Qin, Zihao Wang, Shuoguang Yang
- Abstract summary: We consider the best-k-arm identification problem for multi-armed bandits.
The objective is to select the exact set of k arms with the highest mean rewards by sequentially allocating measurement effort.
We propose information-directed selection (IDS) that selects one of the top-two candidates based on a measure of information gain.
- Score: 13.339829037245963
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We consider the best-k-arm identification problem for multi-armed bandits,
where the objective is to select the exact set of k arms with the highest mean
rewards by sequentially allocating measurement effort. We characterize the
necessary and sufficient conditions for the optimal allocation using dual
variables. Remarkably these optimality conditions lead to the extension of
top-two algorithm design principle (Russo, 2020), initially proposed for
best-arm identification. Furthermore, our optimality conditions induce a simple
and effective selection rule dubbed information-directed selection (IDS) that
selects one of the top-two candidates based on a measure of information gain.
As a theoretical guarantee, we prove that integrated with IDS, top-two Thompson
sampling is (asymptotically) optimal for Gaussian best-arm identification,
solving a glaring open problem in the pure exploration literature (Russo,
2020). As a by-product, we show that for k > 1, top-two algorithms cannot
achieve optimality even when the algorithm has access to the unknown "optimal"
tuning parameter. Numerical experiments show the superior performance of the
proposed top-two algorithms with IDS and considerable improvement compared with
algorithms without adaptive selection.
Related papers
- Optimal Multi-Fidelity Best-Arm Identification [65.23078799972188]
In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible.
We study multi-fidelity best-arm identification, in which the can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost.
Several methods have been proposed for tackling this problem, but their optimality remain elusive, notably due to loose lower bounds on the total cost needed to identify the best arm.
arXiv Detail & Related papers (2024-06-05T08:02:40Z) - Dual-Directed Algorithm Design for Efficient Pure Exploration [11.492736493413103]
We consider pure-exploration problems in the context of sequential adaptive experiments with a finite set of alternative options.
We derive a sufficient condition for optimality in terms of a notion of strong convergence to the optimal allocation of samples.
Our algorithm is optimal for $epsilon$-best-arm identification and thresholding bandit problems.
arXiv Detail & Related papers (2023-10-30T07:29:17Z) - Choosing Answers in $\varepsilon$-Best-Answer Identification for Linear
Bandits [0.8122270502556374]
In a pure-exploration problem, information is gathered sequentially to answer a question on the environment.
We show that picking the answer with highest mean does not allow an algorithm to reach optimality in terms of expected sample complexity.
We develop a simple procedure to adapt best-arm identification algorithms to tackle $varepsilon$-best-answer identification in transductive linear bandits.
arXiv Detail & Related papers (2022-06-09T12:27:51Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - RSO: A Novel Reinforced Swarm Optimization Algorithm for Feature
Selection [0.0]
In this paper, we propose a novel feature selection algorithm named Reinforced Swarm Optimization (RSO)
This algorithm embeds the widely used Bee Swarm Optimization (BSO) algorithm along with Reinforcement Learning (RL) to maximize the reward of a superior search agent and punish the inferior ones.
The proposed method is evaluated on 25 widely known UCI datasets containing a perfect blend of balanced and imbalanced data.
arXiv Detail & Related papers (2021-07-29T17:38:04Z) - Online Model Selection for Reinforcement Learning with Function
Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret.
We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Gamification of Pure Exploration for Linear Bandits [34.16123941778227]
We investigate an active pure-exploration setting, that includes best-arm identification, in the context of linear bandits.
Whileally optimal algorithms exist for standard multi-arm bandits, the existence of such algorithms for the best-arm identification in linear bandits has been elusive.
We design the first insightally optimal algorithm for fixed-confidence pure exploration in linear bandits.
arXiv Detail & Related papers (2020-07-02T08:20:35Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z) - Extreme Algorithm Selection With Dyadic Feature Representation [78.13985819417974]
We propose the setting of extreme algorithm selection (XAS) where we consider fixed sets of thousands of candidate algorithms.
We assess the applicability of state-of-the-art AS techniques to the XAS setting and propose approaches leveraging a dyadic feature representation.
arXiv Detail & Related papers (2020-01-29T09:40:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.