Dual-Directed Algorithm Design for Efficient Pure Exploration
- URL: http://arxiv.org/abs/2310.19319v1
- Date: Mon, 30 Oct 2023 07:29:17 GMT
- Title: Dual-Directed Algorithm Design for Efficient Pure Exploration
- Authors: Chao Qin and Wei You
- Abstract summary: We consider pure-exploration problems in the context of sequential adaptive experiments with a finite set of alternative options.
We derive a sufficient condition for optimality in terms of a notion of strong convergence to the optimal allocation of samples.
Our algorithm is optimal for $epsilon$-best-arm identification and thresholding bandit problems.
- Score: 11.492736493413103
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider pure-exploration problems in the context of stochastic sequential
adaptive experiments with a finite set of alternative options. The goal of the
decision-maker is to accurately answer a query question regarding the
alternatives with high confidence with minimal measurement efforts. A typical
query question is to identify the alternative with the best performance,
leading to ranking and selection problems, or best-arm identification in the
machine learning literature. We focus on the fixed-precision setting and derive
a sufficient condition for optimality in terms of a notion of strong
convergence to the optimal allocation of samples. Using dual variables, we
characterize the necessary and sufficient conditions for an allocation to be
optimal. The use of dual variables allow us to bypass the combinatorial
structure of the optimality conditions that relies solely on primal variables.
Remarkably, these optimality conditions enable an extension of top-two
algorithm design principle, initially proposed for best-arm identification.
Furthermore, our optimality conditions give rise to a straightforward yet
efficient selection rule, termed information-directed selection, which
adaptively picks from a candidate set based on information gain of the
candidates. We outline the broad contexts where our algorithmic approach can be
implemented. We establish that, paired with information-directed selection,
top-two Thompson sampling is (asymptotically) optimal for Gaussian best-arm
identification, solving a glaring open problem in the pure exploration
literature. Our algorithm is optimal for $\epsilon$-best-arm identification and
thresholding bandit problems. Our analysis also leads to a general principle to
guide adaptations of Thompson sampling for pure-exploration problems. Numerical
experiments highlight the exceptional efficiency of our proposed algorithms
relative to existing ones.
Related papers
- An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences.
We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration.
Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z) - Optimal Multi-Fidelity Best-Arm Identification [65.23078799972188]
In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible.
We study multi-fidelity best-arm identification, in which the can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost.
Several methods have been proposed for tackling this problem, but their optimality remain elusive, notably due to loose lower bounds on the total cost needed to identify the best arm.
arXiv Detail & Related papers (2024-06-05T08:02:40Z) - Dynamic Incremental Optimization for Best Subset Selection [15.8362578568708]
Best subset selection is considered the gold standard for many learning problems.
An efficient subset-dual algorithm is developed based on the primal and dual problem structures.
arXiv Detail & Related papers (2024-02-04T02:26:40Z) - Generalizing Bayesian Optimization with Decision-theoretic Entropies [102.82152945324381]
We consider a generalization of Shannon entropy from work in statistical decision theory.
We first show that special cases of this entropy lead to popular acquisition functions used in BO procedures.
We then show how alternative choices for the loss yield a flexible family of acquisition functions.
arXiv Detail & Related papers (2022-10-04T04:43:58Z) - Best Subset Selection with Efficient Primal-Dual Algorithm [24.568094642425837]
Best subset selection is considered the gold standard' for many learning problems.
In this paper, we investigate the dual forms of a family of $ell$-regularized problems.
An efficient primal-dual method has been developed based on the primal and dual problem structures.
arXiv Detail & Related papers (2022-07-05T14:07:52Z) - Choosing Answers in $\varepsilon$-Best-Answer Identification for Linear
Bandits [0.8122270502556374]
In a pure-exploration problem, information is gathered sequentially to answer a question on the environment.
We show that picking the answer with highest mean does not allow an algorithm to reach optimality in terms of expected sample complexity.
We develop a simple procedure to adapt best-arm identification algorithms to tackle $varepsilon$-best-answer identification in transductive linear bandits.
arXiv Detail & Related papers (2022-06-09T12:27:51Z) - Information-Directed Selection for Top-Two Algorithms [13.339829037245963]
We consider the best-k-arm identification problem for multi-armed bandits.
The objective is to select the exact set of k arms with the highest mean rewards by sequentially allocating measurement effort.
We propose information-directed selection (IDS) that selects one of the top-two candidates based on a measure of information gain.
arXiv Detail & Related papers (2022-05-24T14:07:13Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Stochastic Optimization Forests [60.523606291705214]
We show how to train forest decision policies by growing trees that choose splits to directly optimize the downstream decision quality, rather than splitting to improve prediction accuracy as in the standard random forest algorithm.
We show that our approximate splitting criteria can reduce running time hundredfold, while achieving performance close to forest algorithms that exactly re-optimize for every candidate split.
arXiv Detail & Related papers (2020-08-17T16:56:06Z) - Gamification of Pure Exploration for Linear Bandits [34.16123941778227]
We investigate an active pure-exploration setting, that includes best-arm identification, in the context of linear bandits.
Whileally optimal algorithms exist for standard multi-arm bandits, the existence of such algorithms for the best-arm identification in linear bandits has been elusive.
We design the first insightally optimal algorithm for fixed-confidence pure exploration in linear bandits.
arXiv Detail & Related papers (2020-07-02T08:20:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.