Instability, Computational Efficiency and Statistical Accuracy
- URL: http://arxiv.org/abs/2005.11411v2
- Date: Sun, 20 Mar 2022 05:25:25 GMT
- Title: Instability, Computational Efficiency and Statistical Accuracy
- Authors: Nhat Ho, Koulik Khamaru, Raaz Dwivedi, Martin J. Wainwright, Michael
I. Jordan, Bin Yu
- Abstract summary: We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
- Score: 101.32305022521024
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many statistical estimators are defined as the fixed point of a
data-dependent operator, with estimators based on minimizing a cost function
being an important special case. The limiting performance of such estimators
depends on the properties of the population-level operator in the idealized
limit of infinitely many samples. We develop a general framework that yields
bounds on statistical accuracy based on the interplay between the deterministic
convergence rate of the algorithm at the population level, and its degree of
(in)stability when applied to an empirical object based on $n$ samples. Using
this framework, we analyze both stable forms of gradient descent and some
higher-order and unstable algorithms, including Newton's method and its
cubic-regularized variant, as well as the EM algorithm. We provide applications
of our general results to several concrete classes of models, including
Gaussian mixture estimation, non-linear regression models, and informative
non-response models. We exhibit cases in which an unstable algorithm can
achieve the same statistical accuracy as a stable algorithm in exponentially
fewer steps -- namely, with the number of iterations being reduced from
polynomial to logarithmic in sample size $n$.
Related papers
- Computationally efficient reductions between some statistical models [15.998213043947485]
We study the problem of transforming a sample from a source statistical model to a sample from a target statistical model without knowing the parameters of the source model.
We provide computationally efficient procedures that approximately reduce uniform, Erlang, and Laplace location models to general target families.
arXiv Detail & Related papers (2024-02-12T15:32:38Z) - Convergence of uncertainty estimates in Ensemble and Bayesian sparse
model discovery [4.446017969073817]
We show empirical success in terms of accuracy and robustness to noise with bootstrapping-based sequential thresholding least-squares estimator.
We show that this bootstrapping-based ensembling technique can perform a provably correct variable selection procedure with an exponential convergence rate of the error rate.
arXiv Detail & Related papers (2023-01-30T04:07:59Z) - Scalable Estimation for Structured Additive Distributional Regression [0.0]
We propose a novel backfitting algorithm, which is based on the ideas of gradient descent and can deal virtually with any amount of data on a conventional laptop.
Performance is evaluated using an extensive simulation study and an exceptionally challenging and unique example of lightning count prediction over Austria.
arXiv Detail & Related papers (2023-01-13T14:59:42Z) - Numerically Stable Sparse Gaussian Processes via Minimum Separation
using Cover Trees [57.67528738886731]
We study the numerical stability of scalable sparse approximations based on inducing points.
For low-dimensional tasks such as geospatial modeling, we propose an automated method for computing inducing points satisfying these conditions.
arXiv Detail & Related papers (2022-10-14T15:20:17Z) - Adaptive LASSO estimation for functional hidden dynamic geostatistical
model [69.10717733870575]
We propose a novel model selection algorithm based on a penalized maximum likelihood estimator (PMLE) for functional hiddenstatistical models (f-HD)
The algorithm is based on iterative optimisation and uses an adaptive least absolute shrinkage and selector operator (GMSOLAS) penalty function, wherein the weights are obtained by the unpenalised f-HD maximum-likelihood estimators.
arXiv Detail & Related papers (2022-08-10T19:17:45Z) - Improving Computational Complexity in Statistical Models with
Second-Order Information [32.64382185990981]
We study the normalized gradient descent (NormGD) algorithm for solving parameter estimation in parametric statistical models.
We demonstrate that the NormGD algorithm achieves the optimal overall computational complexity $mathcalO(n)$ to reach the final statistical radius.
This computational complexity is cheaper than that of the fixed step-size gradient descent algorithm.
arXiv Detail & Related papers (2022-02-09T01:32:50Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z) - Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis [102.29671176698373]
We address the problem of policy evaluation in discounted decision processes, and provide Markov-dependent guarantees on the $ell_infty$error under a generative model.
We establish both and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms.
arXiv Detail & Related papers (2020-03-16T17:15:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.