Data-driven optimal stopping: A pure exploration analysis
- URL: http://arxiv.org/abs/2312.05880v1
- Date: Sun, 10 Dec 2023 13:16:01 GMT
- Title: Data-driven optimal stopping: A pure exploration analysis
- Authors: S\"oren Christensen, Niklas Dexheimer, Claudia Strauch
- Abstract summary: Minimax optimality is verified by completing the upper bound results with matching lower bounds on the simple regret.
We investigate how our results on the simple regret transfer to the cumulative regret for a specific exploration-exploitation strategy.
- Score: 1.0742675209112622
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The standard theory of optimal stopping is based on the idealised assumption
that the underlying process is essentially known. In this paper, we drop this
restriction and study data-driven optimal stopping for a general diffusion
process, focusing on investigating the statistical performance of the proposed
estimator of the optimal stopping barrier. More specifically, we derive
non-asymptotic upper bounds on the simple regret, along with uniform and
non-asymptotic PAC bounds. Minimax optimality is verified by completing the
upper bound results with matching lower bounds on the simple regret. All
results are shown both under general conditions on the payoff functions and
under more refined assumptions that mimic the margin condition used in binary
classification, leading to an improved rate of convergence. Additionally, we
investigate how our results on the simple regret transfer to the cumulative
regret for a specific exploration-exploitation strategy, both with respect to
lower bounds and upper bounds.
Related papers
- Regret Optimality of GP-UCB [12.323109084902228]
Gaussian Process Upper Confidence Bound (GP-UCB) is one of the most popular methods for optimizing black-box functions with noisy observations.
We establish new upper bounds on both the simple and cumulative regret of GP-UCB when the objective function to optimize admits certain smoothness property.
With the same level of exploration, GP-UCB can simultaneously achieve optimality in both simple and cumulative regret.
arXiv Detail & Related papers (2023-12-03T13:20:08Z) - Off-policy estimation of linear functionals: Non-asymptotic theory for
semi-parametric efficiency [59.48096489854697]
The problem of estimating a linear functional based on observational data is canonical in both the causal inference and bandit literatures.
We prove non-asymptotic upper bounds on the mean-squared error of such procedures.
We establish its instance-dependent optimality in finite samples via matching non-asymptotic local minimax lower bounds.
arXiv Detail & Related papers (2022-09-26T23:50:55Z) - Sharp Bounds for Federated Averaging (Local SGD) and Continuous
Perspective [49.17352150219212]
Federated AveragingFedAvg, also known as Local SGD, is one of the most popular algorithms in Federated Learning (FL)
We show how to analyze this quantity from the Differential Equation (SDE) perspective.
arXiv Detail & Related papers (2021-11-05T22:16:11Z) - Minimax Optimal Quantile and Semi-Adversarial Regret via
Root-Logarithmic Regularizers [31.102181563065844]
Quantile (and, more generally, KL) regret bounds relax the goal of competing against the best individual expert to only competing against a majority of experts on adversarial data.
More recently, the semi-adversarial paradigm (Bilodeau, Negrea, and Roy 2020) provides an alternative relaxation of adversarial online learning by considering data that may be neither fully adversarial nor adversarial.
We achieve the minimax optimal regret in both paradigms using FTRL with separate, novel, root-logarithmic regularizers, both of which can be interpreted as yielding variants of NormalHedge.
arXiv Detail & Related papers (2021-10-27T22:38:52Z) - High-probability Bounds for Non-Convex Stochastic Optimization with
Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails.
We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z) - On the Oracle Complexity of Higher-Order Smooth Non-Convex Finite-Sum
Optimization [1.6973426830397942]
We prove lower bounds for higher-order methods in smooth non finite-sum optimization.
We show that a pth-order regularized method cannot profit from the finitesum objective.
We show that a new second-order smoothness assumption can be seen as an analogue to the first-order mean-squared smoothness.
arXiv Detail & Related papers (2021-03-08T23:33:58Z) - Finite Sample Analysis of Minimax Offline Reinforcement Learning:
Completeness, Fast Rates and First-Order Efficiency [83.02999769628593]
We offer a theoretical characterization of off-policy evaluation (OPE) in reinforcement learning.
We show that the minimax approach enables us to achieve a fast rate of convergence for weights and quality functions.
We present the first finite-sample result with first-order efficiency in non-tabular environments.
arXiv Detail & Related papers (2021-02-05T03:20:39Z) - On Lower Bounds for Standard and Robust Gaussian Process Bandit
Optimization [55.937424268654645]
We consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm.
We provide a novel proof technique for deriving lower bounds on the regret, with benefits including simplicity, versatility, and an improved dependence on the error probability.
arXiv Detail & Related papers (2020-08-20T03:48:14Z) - Relaxing the I.I.D. Assumption: Adaptively Minimax Optimal Regret via
Root-Entropic Regularization [16.536558038560695]
We consider prediction with expert advice when data are generated from varying arbitrarily within an unknown constraint set.
The Hedge algorithm was recently shown to be simultaneously minimax optimal for i.i.d. data.
We provide matching upper and lower bounds on the minimax regret at all levels, show that Hedge with deterministic learning rates is suboptimal outside of the extremes, and prove that one can adaptively obtain minimax regret at all levels.
arXiv Detail & Related papers (2020-07-13T17:54:34Z) - Fast Objective & Duality Gap Convergence for Non-Convex Strongly-Concave
Min-Max Problems with PL Condition [52.08417569774822]
This paper focuses on methods for solving smooth non-concave min-max problems, which have received increasing attention due to deep learning (e.g., deep AUC)
arXiv Detail & Related papers (2020-06-12T00:32:21Z) - On Low-rank Trace Regression under General Sampling Distribution [9.699586426043885]
We show that cross-validated estimators satisfy near-optimal error bounds on general assumptions.
We also show that the cross-validated estimator outperforms the theory-inspired approach of selecting the parameter.
arXiv Detail & Related papers (2019-04-18T02:56:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.