Best Arm Identification in Stochastic Bandits: Beyond $\beta-$optimality
- URL: http://arxiv.org/abs/2301.03785v2
- Date: Thu, 22 Jun 2023 20:34:25 GMT
- Title: Best Arm Identification in Stochastic Bandits: Beyond $\beta-$optimality
- Authors: Arpan Mukherjee and Ali Tajer
- Abstract summary: This paper investigates a hitherto unaddressed aspect of best arm identification (BAI) in multi-armed bandits in the fixed-confidence setting.
Two key metrics for assessing bandit algorithms are computational efficiency and performance optimality.
This paper introduces a framework and an algorithm for BAI that achieves optimal performance with a computationally efficient set of decision rules.
- Score: 31.359578768463752
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper investigates a hitherto unaddressed aspect of best arm
identification (BAI) in stochastic multi-armed bandits in the fixed-confidence
setting. Two key metrics for assessing bandit algorithms are computational
efficiency and performance optimality (e.g., in sample complexity). In
stochastic BAI literature, there have been advances in designing algorithms to
achieve optimal performance, but they are generally computationally expensive
to implement (e.g., optimization-based methods). There also exist approaches
with high computational efficiency, but they have provable gaps to the optimal
performance (e.g., the $\beta$-optimal approaches in top-two methods). This
paper introduces a framework and an algorithm for BAI that achieves optimal
performance with a computationally efficient set of decision rules. The central
process that facilitates this is a routine for sequentially estimating the
optimal allocations up to sufficient fidelity. Specifically, these estimates
are accurate enough for identifying the best arm (hence, achieving optimality)
but not overly accurate to an unnecessary extent that creates excessive
computational complexity (hence, maintaining efficiency). Furthermore, the
existing relevant literature focuses on the family of exponential
distributions. This paper considers a more general setting of any arbitrary
family of distributions parameterized by their mean values (under mild
regularity conditions). The optimality is established analytically, and
numerical evaluations are provided to assess the analytical guarantees and
compare the performance with those of the existing ones.
Related papers
- Optimal Multi-Fidelity Best-Arm Identification [65.23078799972188]
In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible.
We study multi-fidelity best-arm identification, in which the can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost.
Several methods have been proposed for tackling this problem, but their optimality remain elusive, notably due to loose lower bounds on the total cost needed to identify the best arm.
arXiv Detail & Related papers (2024-06-05T08:02:40Z) - A Novel Ranking Scheme for the Performance Analysis of Stochastic Optimization Algorithms using the Principles of Severity [9.310464457958844]
We provide a novel ranking scheme to rank the algorithms over multiple single-objective optimization problems.
The results of the algorithms are compared using a robust bootstrapping-based hypothesis testing procedure.
arXiv Detail & Related papers (2024-05-31T19:35:34Z) - Selection of the Most Probable Best [2.1095005405219815]
We consider an expected-value ranking and selection (R&S) problem where all k solutions' simulation outputs depend on a common parameter whose uncertainty can be modeled by a distribution.
We define the most probable best (MPB) to be the solution that has the largest probability of being optimal with respect to the distribution.
We devise a series of algorithms that replace the unknown means in the optimality conditions with their estimates and prove the algorithms' sampling ratios achieve the conditions as the simulation budget increases.
arXiv Detail & Related papers (2022-07-15T15:27:27Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - Outlier-Robust Sparse Estimation via Non-Convex Optimization [73.18654719887205]
We explore the connection between high-dimensional statistics and non-robust optimization in the presence of sparsity constraints.
We develop novel and simple optimization formulations for these problems.
As a corollary, we obtain that any first-order method that efficiently converges to station yields an efficient algorithm for these tasks.
arXiv Detail & Related papers (2021-09-23T17:38:24Z) - Benchmarking Simulation-Based Inference [5.3898004059026325]
Recent advances in probabilistic modelling have led to a large number of simulation-based inference algorithms which do not require numerical evaluation of likelihoods.
We provide a benchmark with inference tasks and suitable performance metrics, with an initial selection of algorithms.
We found that the choice of performance metric is critical, that even state-of-the-art algorithms have substantial room for improvement, and that sequential estimation improves sample efficiency.
arXiv Detail & Related papers (2021-01-12T18:31:22Z) - Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems.
We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z) - Large-Scale Methods for Distributionally Robust Optimization [53.98643772533416]
We prove that our algorithms require a number of evaluations gradient independent of training set size and number of parameters.
Experiments on MNIST and ImageNet confirm the theoretical scaling of our algorithms, which are 9--36 times more efficient than full-batch methods.
arXiv Detail & Related papers (2020-10-12T17:41:44Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.