Related papers: Learning to Stop with Surprisingly Few Samples

Learning to Stop with Surprisingly Few Samples

URL: http://arxiv.org/abs/2102.10025v2
Date: Mon, 22 Feb 2021 04:25:24 GMT
Title: Learning to Stop with Surprisingly Few Samples
Authors: Daniel Russo, Assaf Zeevi, Tianyi Zhang
Abstract summary: We consider a discounted infinite horizon optimal stopping problem. If the underlying distribution is known a priori, the solution of this problem is obtained via dynamic programming. When information on this distribution is lacking, a natural (though naive) approach is "explore-then-exploit"
Score: 17.46537996825982
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider a discounted infinite horizon optimal stopping problem. If the underlying distribution is known a priori, the solution of this problem is obtained via dynamic programming (DP) and is given by a well known threshold rule. When information on this distribution is lacking, a natural (though naive) approach is "explore-then-exploit," whereby the unknown distribution or its parameters are estimated over an initial exploration phase, and this estimate is then used in the DP to determine actions over the residual exploitation phase. We show: (i) with proper tuning, this approach leads to performance comparable to the full information DP solution; and (ii) despite common wisdom on the sensitivity of such "plug in" approaches in DP due to propagation of estimation errors, a surprisingly "short" (logarithmic in the horizon) exploration horizon suffices to obtain said performance. In cases where the underlying distribution is heavy-tailed, these observations are even more pronounced: a ${\it single \, sample}$ exploration phase suffices.

Related papers

Active Bipartite Ranking with Smooth Posterior Distributions [1.9838140219494644]
Bipartite ranking is a statistical learning problem involved in many applications and widely studied in the passive context.<n>We propose a novel algorithm, referred to as smooth-rank and designed for the continuous setting, which aims to minimise the distance between the ROC curve of the estimated ranking rule and the optimal one w.r.t. the $sup$ norm.<n>We provide a problem dependent upper bound on the expected sampling time of smooth-rank and establish a problem dependent lower bound on the expected sampling time of any PAC$(,)$ algorithm.
arXiv Detail & Related papers (2026-02-27T18:32:08Z)
Consistent Estimation of Numerical Distributions under Local Differential Privacy by Wavelet Expansion [33.04753728429468]
We propose a new approach that express the sample distribution using wavelet expansions.<n>Our method prioritizes the estimation of low-order coefficients, in order to ensure accurate estimation at macroscopic level.<n> Experiments show that our wavelet expansion method significantly outperforms existing solutions under Wasserstein and KS distances.
arXiv Detail & Related papers (2025-09-24T00:37:22Z)
Minimax Optimal Two-Stage Algorithm For Moment Estimation Under Covariate Shift [10.35788775775647]
We investigate the minimax lower bound of the problem when the source and target distributions are known.<n>Specifically, it first trains an optimal estimator for the function under the source distribution, and then uses a likelihood ratio reweighting procedure to calibrate the moment estimator.<n>To solve this problem, we propose a truncated version of the estimator that ensures double robustness and provide the corresponding upper bound.
arXiv Detail & Related papers (2025-06-30T01:32:36Z)
An Improved Algorithm for Learning Drifting Discrete Distributions [2.2191203337341525]
We present a new adaptive algorithm for learning discrete distributions under distribution drift. We observe a sequence of independent samples from a discrete distribution that is changing over time, and the goal is to estimate the current distribution. To use more samples, we must resort to samples further in the past, and we incur a drift error due to the bias introduced by the change in distribution. We present a novel adaptive algorithm that can solve this trade-off without any prior knowledge of the drift.
arXiv Detail & Related papers (2024-03-08T16:54:27Z)
Estimation Beyond Data Reweighting: Kernel Method of Moments [9.845144212844662]
We provide an empirical likelihood estimator based on maximum mean discrepancy which we term the kernel method of moments (KMM) We show that our method achieves competitive performance on several conditional moment restriction tasks.
arXiv Detail & Related papers (2023-05-18T11:52:43Z)
Posterior-Variance-Based Error Quantification for Inverse Problems in Imaging [8.510101522152231]
The proposed method employs estimates of the posterior variance together with techniques from conformal prediction. The coverage guarantees can also be obtained in case only approximate sampling from the posterior is possible. Experiments with multiple regularization approaches presented in the paper confirm that in practice, the obtained error bounds are rather tight.
arXiv Detail & Related papers (2022-12-23T17:45:38Z)
ZigZag: Universal Sampling-free Uncertainty Estimation Through Two-Step Inference [54.17205151960878]
We introduce a sampling-free approach that is generic and easy to deploy. We produce reliable uncertainty estimates on par with state-of-the-art methods at a significantly lower computational cost.
arXiv Detail & Related papers (2022-11-21T13:23:09Z)
On the Pitfalls of Heteroscedastic Uncertainty Estimation with Probabilistic Neural Networks [23.502721524477444]
We present a synthetic example illustrating how this approach can lead to very poor but stable estimates. We identify the culprit to be the log-likelihood loss, along with certain conditions that exacerbate the issue. We present an alternative formulation, termed $beta$-NLL, in which each data point's contribution to the loss is weighted by the $beta$-exponentiated variance estimate.
arXiv Detail & Related papers (2022-03-17T08:46:17Z)
Under-Approximating Expected Total Rewards in POMDPs [68.8204255655161]
We consider the optimal expected total reward to reach a goal state in a partially observable Markov decision process (POMDP) We use mixed-integer linear programming (MILP) to find such minimal probability shifts and experimentally show that our techniques scale quite well.
arXiv Detail & Related papers (2022-01-21T16:43:03Z)
Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models. In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints. A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z)
On the Practicality of Deterministic Epistemic Uncertainty [106.06571981780591]
deterministic uncertainty methods (DUMs) achieve strong performance on detecting out-of-distribution data. It remains unclear whether DUMs are well calibrated and can seamlessly scale to real-world applications.
arXiv Detail & Related papers (2021-07-01T17:59:07Z)
Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function. We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z)
A maximum-entropy approach to off-policy evaluation in average-reward MDPs [54.967872716145656]
This work focuses on off-policy evaluation (OPE) with function approximation in infinite-horizon undiscounted Markov decision processes (MDPs) We provide the first finite-sample OPE error bound, extending existing results beyond the episodic and discounted cases. We show that this results in an exponential-family distribution whose sufficient statistics are the features, paralleling maximum-entropy approaches in supervised learning.
arXiv Detail & Related papers (2020-06-17T18:13:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.