An $\varepsilon$-Best-Arm Identification Algorithm for Fixed-Confidence
and Beyond
- URL: http://arxiv.org/abs/2305.16041v2
- Date: Mon, 6 Nov 2023 09:47:28 GMT
- Title: An $\varepsilon$-Best-Arm Identification Algorithm for Fixed-Confidence
and Beyond
- Authors: Marc Jourdan, R\'emy Degenne, Emilie Kaufmann
- Abstract summary: We propose EB-TC$varepsilon$, a novel sampling rule for best arm identification in bandits.
We provide three types of theoretical guarantees for EB-TC$varepsilon$.
We show through numerical simulations that EB-TC$varepsilon$ performs favorably compared to existing algorithms.
- Score: 15.871535287520393
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We propose EB-TC$\varepsilon$, a novel sampling rule for $\varepsilon$-best
arm identification in stochastic bandits. It is the first instance of Top Two
algorithm analyzed for approximate best arm identification. EB-TC$\varepsilon$
is an *anytime* sampling rule that can therefore be employed without
modification for fixed confidence or fixed budget identification (without prior
knowledge of the budget). We provide three types of theoretical guarantees for
EB-TC$\varepsilon$. First, we prove bounds on its expected sample complexity in
the fixed confidence setting, notably showing its asymptotic optimality in
combination with an adaptive tuning of its exploration parameter. We complement
these findings with upper bounds on its probability of error at any time and
for any error parameter, which further yield upper bounds on its simple regret
at any time. Finally, we show through numerical simulations that
EB-TC$\varepsilon$ performs favorably compared to existing algorithms, in
different settings.
Related papers
- Almost Minimax Optimal Best Arm Identification in Piecewise Stationary Linear Bandits [55.957560311008926]
We propose a piecewise stationary linear bandit (PSLB) model where the quality of an arm is measured by its return averaged over all contexts.
PS$varepsilon$BAI$+$ is guaranteed to identify an $varepsilon$-optimal arm with probability $ge 1-delta$ and with a minimal number of samples.
arXiv Detail & Related papers (2024-10-10T06:15:42Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - Normalized/Clipped SGD with Perturbation for Differentially Private
Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data.
We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z) - On the complexity of All $\varepsilon$-Best Arms Identification [2.1485350418225244]
We consider the problem of identifying all the $varepsilon$-optimal arms in a finite multi-armed bandit with Gaussian rewards.
We propose a Track-and-Stop algorithm that identifies the set of $varepsilon$-good arms w.h.p and enjoys optimality (when $delta$ goes to zero) in terms of the expected sample complexity.
arXiv Detail & Related papers (2022-02-13T10:54:52Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - Misspecified Gaussian Process Bandit Optimization [59.30399661155574]
Kernelized bandit algorithms have shown strong empirical and theoretical performance for this problem.
We introduce a emphmisspecified kernelized bandit setting where the unknown function can be $epsilon$--uniformly approximated by a function with a bounded norm in some Reproducing Kernel Hilbert Space (RKHS)
We show that our algorithm achieves optimal dependence on $epsilon$ with no prior knowledge of misspecification.
arXiv Detail & Related papers (2021-11-09T09:00:02Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z) - Regret and Belief Complexity Trade-off in Gaussian Process Bandits via
Information Thresholding [42.669970064867556]
We show how to characterize the trade-off between regret bounds of GP bandit algorithms and complexity of the posterior distributions.
We observe state of the art accuracy and complexity trade-offs for GP bandit algorithms applied to global optimization.
arXiv Detail & Related papers (2020-03-23T21:05:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.