Statistical Inference for Model Parameters in Stochastic Gradient
Descent
- URL: http://arxiv.org/abs/1610.08637v4
- Date: Wed, 1 Nov 2023 08:54:26 GMT
- Title: Statistical Inference for Model Parameters in Stochastic Gradient
Descent
- Authors: Xi Chen and Jason D. Lee and Xin T. Tong and Yichen Zhang
- Abstract summary: gradient descent coefficients (SGD) has been widely used in statistical estimation for large-scale data due to its computational and memory efficiency.
We investigate the problem of statistical inference of true model parameters based on SGD when the population loss function is strongly convex and satisfies certain conditions.
- Score: 45.29532403359099
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The stochastic gradient descent (SGD) algorithm has been widely used in
statistical estimation for large-scale data due to its computational and memory
efficiency. While most existing works focus on the convergence of the objective
function or the error of the obtained solution, we investigate the problem of
statistical inference of true model parameters based on SGD when the population
loss function is strongly convex and satisfies certain smoothness conditions.
Our main contributions are two-fold. First, in the fixed dimension setup, we
propose two consistent estimators of the asymptotic covariance of the average
iterate from SGD: (1) a plug-in estimator, and (2) a batch-means estimator,
which is computationally more efficient and only uses the iterates from SGD.
Both proposed estimators allow us to construct asymptotically exact confidence
intervals and hypothesis tests. Second, for high-dimensional linear regression,
using a variant of the SGD algorithm, we construct a debiased estimator of each
regression coefficient that is asymptotically normal. This gives a one-pass
algorithm for computing both the sparse regression coefficients and confidence
intervals, which is computationally attractive and applicable to online data.
Related papers
- Learning Unnormalized Statistical Models via Compositional Optimization [73.30514599338407]
Noise-contrastive estimation(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise.
In this paper, we study it a direct approach for optimizing the negative log-likelihood of unnormalized models.
arXiv Detail & Related papers (2023-06-13T01:18:16Z) - Stochastic Mirror Descent for Large-Scale Sparse Recovery [13.500750042707407]
We discuss an application of quadratic Approximation to statistical estimation of high-dimensional sparse parameters.
We show that the proposed algorithm attains the optimal convergence of the estimation error under weak assumptions on the regressor distribution.
arXiv Detail & Related papers (2022-10-23T23:23:23Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - An Exponentially Increasing Step-size for Parameter Estimation in
Statistical Models [37.63410634069547]
We propose to exponentially increase the step-size of the Gaussian descent (GD) algorithm.
We then consider using the EGD algorithm for solving parameter estimation under non-regular statistical models.
The total computational complexity of the EGD algorithm is emphoptimal and exponentially cheaper than that of the GD for solving parameter estimation in non-regular statistical models.
arXiv Detail & Related papers (2022-05-16T21:36:22Z) - Improving Computational Complexity in Statistical Models with
Second-Order Information [32.64382185990981]
We study the normalized gradient descent (NormGD) algorithm for solving parameter estimation in parametric statistical models.
We demonstrate that the NormGD algorithm achieves the optimal overall computational complexity $mathcalO(n)$ to reach the final statistical radius.
This computational complexity is cheaper than that of the fixed step-size gradient descent algorithm.
arXiv Detail & Related papers (2022-02-09T01:32:50Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - Fast and Robust Online Inference with Stochastic Gradient Descent via
Random Scaling [0.9806910643086042]
We develop a new method of online inference for a vector of parameters estimated by the Polyak-Rtupper averaging procedure of gradient descent algorithms.
Our approach is fully operational with online data and is rigorously underpinned by a functional central limit theorem.
arXiv Detail & Related papers (2021-06-06T15:38:37Z) - Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z) - Online Covariance Matrix Estimation in Stochastic Gradient Descent [10.153224593032677]
gradient descent (SGD) is widely used for parameter estimation especially for huge data sets and online learning.
This paper aims at quantifying statistical inference of SGD-based estimates in an online setting.
arXiv Detail & Related papers (2020-02-10T17:46:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.