Dual-Directed Algorithm Design for Efficient Pure Exploration
- URL: http://arxiv.org/abs/2310.19319v1
- Date: Mon, 30 Oct 2023 07:29:17 GMT
- Title: Dual-Directed Algorithm Design for Efficient Pure Exploration
- Authors: Chao Qin and Wei You
- Abstract summary: We consider pure-exploration problems in the context of sequential adaptive experiments with a finite set of alternative options.
We derive a sufficient condition for optimality in terms of a notion of strong convergence to the optimal allocation of samples.
Our algorithm is optimal for $epsilon$-best-arm identification and thresholding bandit problems.
- Score: 11.492736493413103
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider pure-exploration problems in the context of stochastic sequential
adaptive experiments with a finite set of alternative options. The goal of the
decision-maker is to accurately answer a query question regarding the
alternatives with high confidence with minimal measurement efforts. A typical
query question is to identify the alternative with the best performance,
leading to ranking and selection problems, or best-arm identification in the
machine learning literature. We focus on the fixed-precision setting and derive
a sufficient condition for optimality in terms of a notion of strong
convergence to the optimal allocation of samples. Using dual variables, we
characterize the necessary and sufficient conditions for an allocation to be
optimal. The use of dual variables allow us to bypass the combinatorial
structure of the optimality conditions that relies solely on primal variables.
Remarkably, these optimality conditions enable an extension of top-two
algorithm design principle, initially proposed for best-arm identification.
Furthermore, our optimality conditions give rise to a straightforward yet
efficient selection rule, termed information-directed selection, which
adaptively picks from a candidate set based on information gain of the
candidates. We outline the broad contexts where our algorithmic approach can be
implemented. We establish that, paired with information-directed selection,
top-two Thompson sampling is (asymptotically) optimal for Gaussian best-arm
identification, solving a glaring open problem in the pure exploration
literature. Our algorithm is optimal for $\epsilon$-best-arm identification and
thresholding bandit problems. Our analysis also leads to a general principle to
guide adaptations of Thompson sampling for pure-exploration problems. Numerical
experiments highlight the exceptional efficiency of our proposed algorithms
relative to existing ones.
Related papers
- An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences.
We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration.
Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z) - Dynamic Incremental Optimization for Best Subset Selection [15.8362578568708]
Best subset selection is considered the gold standard for many learning problems.
An efficient subset-dual algorithm is developed based on the primal and dual problem structures.
arXiv Detail & Related papers (2024-02-04T02:26:40Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Generalizing Bayesian Optimization with Decision-theoretic Entropies [102.82152945324381]
We consider a generalization of Shannon entropy from work in statistical decision theory.
We first show that special cases of this entropy lead to popular acquisition functions used in BO procedures.
We then show how alternative choices for the loss yield a flexible family of acquisition functions.
arXiv Detail & Related papers (2022-10-04T04:43:58Z) - Best Subset Selection with Efficient Primal-Dual Algorithm [24.568094642425837]
Best subset selection is considered the gold standard' for many learning problems.
In this paper, we investigate the dual forms of a family of $ell$-regularized problems.
An efficient primal-dual method has been developed based on the primal and dual problem structures.
arXiv Detail & Related papers (2022-07-05T14:07:52Z) - Information-Directed Selection for Top-Two Algorithms [13.339829037245963]
We consider the best-k-arm identification problem for multi-armed bandits.
The objective is to select the exact set of k arms with the highest mean rewards by sequentially allocating measurement effort.
We propose information-directed selection (IDS) that selects one of the top-two candidates based on a measure of information gain.
arXiv Detail & Related papers (2022-05-24T14:07:13Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Gamification of Pure Exploration for Linear Bandits [34.16123941778227]
We investigate an active pure-exploration setting, that includes best-arm identification, in the context of linear bandits.
Whileally optimal algorithms exist for standard multi-arm bandits, the existence of such algorithms for the best-arm identification in linear bandits has been elusive.
We design the first insightally optimal algorithm for fixed-confidence pure exploration in linear bandits.
arXiv Detail & Related papers (2020-07-02T08:20:35Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.