Related papers: Non-Asymptotic Analysis of (Sticky) Track-and-Stop

Non-Asymptotic Analysis of (Sticky) Track-and-Stop

URL: http://arxiv.org/abs/2505.22475v1
Date: Wed, 28 May 2025 15:26:55 GMT
Title: Non-Asymptotic Analysis of (Sticky) Track-and-Stop
Authors: Riccardo Poiani, Martino Bernasconi, Andrea Celli,
Abstract summary: In pure exploration problems, a statistician sequentially collects information to answer a question about some and unknown environment.<n>The Track-and-Stop algorithm is a pioneering method to solve these problems.<n>We provide non-asymptotic guarantees for both algorithms.
Score: 15.79182979494432
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In pure exploration problems, a statistician sequentially collects information to answer a question about some stochastic and unknown environment. The probability of returning a wrong answer should not exceed a maximum risk parameter $\delta$ and good algorithms make as few queries to the environment as possible. The Track-and-Stop algorithm is a pioneering method to solve these problems. Specifically, it is well-known that it enjoys asymptotic optimality sample complexity guarantees for $\delta\to 0$ whenever the map from the environment to its correct answers is single-valued (e.g., best-arm identification with a unique optimal arm). The Sticky Track-and-Stop algorithm extends these results to settings where, for each environment, there might exist multiple correct answers (e.g., $\epsilon$-optimal arm identification). Although both methods are optimal in the asymptotic regime, their non-asymptotic guarantees remain unknown. In this work, we fill this gap and provide non-asymptotic guarantees for both algorithms.

Related papers

Optimal Top-Two Method for Best Arm Identification and Fluid Analysis [15.353009236788262]
We propose an optimal top-2 type algorithm for the best arm identification problem.<n>We show that the proposed algorithm is optimal as $delta rightarrow 0$.
arXiv Detail & Related papers (2024-03-14T06:14:07Z)
Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel. An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z)
Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model. The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z)
Globally Optimal Algorithms for Fixed-Budget Best Arm Identification [16.500749121196986]
We characterize the optimal rate as a result of global optimization over all possible parameters. We show that this rate is indeed achievable by introducing a conceptual algorithm called delayed optimal tracking (DOT)
arXiv Detail & Related papers (2022-06-09T17:42:19Z)
Choosing Answers in $\varepsilon$-Best-Answer Identification for Linear Bandits [0.8122270502556374]
In a pure-exploration problem, information is gathered sequentially to answer a question on the environment. We show that picking the answer with highest mean does not allow an algorithm to reach optimality in terms of expected sample complexity. We develop a simple procedure to adapt best-arm identification algorithms to tackle $varepsilon$-best-answer identification in transductive linear bandits.
arXiv Detail & Related papers (2022-06-09T12:27:51Z)
On the complexity of All $\varepsilon$-Best Arms Identification [2.1485350418225244]
We consider the problem of identifying all the $varepsilon$-optimal arms in a finite multi-armed bandit with Gaussian rewards. We propose a Track-and-Stop algorithm that identifies the set of $varepsilon$-good arms w.h.p and enjoys optimality (when $delta$ goes to zero) in terms of the expected sample complexity.
arXiv Detail & Related papers (2022-02-13T10:54:52Z)
Optimal and instance-dependent guarantees for Markovian linear stochastic approximation [47.912511426974376]
We show a non-asymptotic bound of the order $t_mathrmmix tfracdn$ on the squared error of the last iterate of a standard scheme. We derive corollaries of these results for policy evaluation with Markov noise.
arXiv Detail & Related papers (2021-12-23T18:47:50Z)
Mean-based Best Arm Identification in Stochastic Bandits under Reward Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits. Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z)
Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes. Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms. We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients. We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.