Statistical Inference for Linear Functionals of Online SGD in High-dimensional Linear Regression
- URL: http://arxiv.org/abs/2302.09727v2
- Date: Mon, 29 Apr 2024 00:53:53 GMT
- Title: Statistical Inference for Linear Functionals of Online SGD in High-dimensional Linear Regression
- Authors: Bhavya Agrawalla, Krishnakumar Balasubramanian, Promit Ghosal,
- Abstract summary: We establish a high-dimensional Central Limit Theorem (CLT) for linear functionals of online gradient descent (SGD)
We develop an online approach for estimating the expectation and the variance terms appearing in the CLT, and establish high-probability bounds for the developed online estimator.
We propose a two-step fully online bias-correction methodology which together with the CLT result and the variance estimation result, provides a fully online and data-driven way to numerically construct confidence intervals.
- Score: 14.521929085104441
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stochastic gradient descent (SGD) has emerged as the quintessential method in a data scientist's toolbox. Using SGD for high-stakes applications requires, however, careful quantification of the associated uncertainty. Towards that end, in this work, we establish a high-dimensional Central Limit Theorem (CLT) for linear functionals of online SGD iterates for overparametrized least-squares regression with non-isotropic Gaussian inputs. Our result shows that a CLT holds even when the dimensionality is of order exponential in the number of iterations of the online SGD, which, to the best of our knowledge, is the first such result. In order to use the developed result in practice, we further develop an online approach for estimating the expectation and the variance terms appearing in the CLT, and establish high-probability bounds for the developed online estimator. Furthermore, we propose a two-step fully online bias-correction methodology which together with the CLT result and the variance estimation result, provides a fully online and data-driven way to numerically construct confidence intervals, thereby enabling practical high-dimensional algorithmic inference with SGD. We also extend our results to a class of single-index models, based on the Gaussian Stein's identity. We also provide numerical simulations to verify our theoretical findings in practice.
Related papers
- On the Convergence of DP-SGD with Adaptive Clipping [56.24689348875711]
Gradient Descent with gradient clipping is a powerful technique for enabling differentially private optimization.
This paper provides the first comprehensive convergence analysis of SGD with quantile clipping (QC-SGD)
We show how QC-SGD suffers from a bias problem similar to constant-threshold clipped SGD but can be mitigated through a carefully designed quantile and step size schedule.
arXiv Detail & Related papers (2024-12-27T20:29:47Z) - Adaptive debiased SGD in high-dimensional GLMs with streaming data [4.704144189806667]
We introduce a novel approach to online inference in high-dimensional generalized linear models.
Our method operates in a single-pass mode, significantly reducing both time and space complexity.
We demonstrate that our method, termed the Approximated Debiased Lasso (ADL), not only mitigates the need for the bounded individual probability condition but also significantly improves numerical performance.
arXiv Detail & Related papers (2024-05-28T15:36:48Z) - Online Tensor Inference [0.0]
Traditional offline learning, involving the storage and utilization of all data in each computational iteration, becomes impractical for high-dimensional tensor data.
Existing low-rank tensor methods lack the capability for statistical inference in an online fashion.
Our approach employs Gradient Descent (SGD) to enable efficient real-time data processing without extensive memory requirements.
arXiv Detail & Related papers (2023-12-28T16:37:48Z) - Online non-parametric likelihood-ratio estimation by Pearson-divergence
functional minimization [55.98760097296213]
We introduce a new framework for online non-parametric LRE (OLRE) for the setting where pairs of iid observations $(x_t sim p, x'_t sim q)$ are observed over time.
We provide theoretical guarantees for the performance of the OLRE method along with empirical validation in synthetic experiments.
arXiv Detail & Related papers (2023-11-03T13:20:11Z) - Kalman Filter for Online Classification of Non-Stationary Data [101.26838049872651]
In Online Continual Learning (OCL) a learning system receives a stream of data and sequentially performs prediction and training steps.
We introduce a probabilistic Bayesian online learning model by using a neural representation and a state space model over the linear predictor weights.
In experiments in multi-class classification we demonstrate the predictive ability of the model and its flexibility to capture non-stationarity.
arXiv Detail & Related papers (2023-06-14T11:41:42Z) - Understanding Augmentation-based Self-Supervised Representation Learning
via RKHS Approximation and Regression [53.15502562048627]
Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator.
This work delves into a statistical analysis of augmentation-based pretraining.
arXiv Detail & Related papers (2023-06-01T15:18:55Z) - Implicit Bias of Gradient Descent for Logistic Regression at the Edge of
Stability [69.01076284478151]
In machine learning optimization, gradient descent (GD) often operates at the edge of stability (EoS)
This paper studies the convergence and implicit bias of constant-stepsize GD for logistic regression on linearly separable data in the EoS regime.
arXiv Detail & Related papers (2023-05-19T16:24:47Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Variational Linearized Laplace Approximation for Bayesian Deep Learning [11.22428369342346]
We propose a new method for approximating Linearized Laplace Approximation (LLA) using a variational sparse Gaussian Process (GP)
Our method is based on the dual RKHS formulation of GPs and retains, as the predictive mean, the output of the original DNN.
It allows for efficient optimization, which results in sub-linear training time in the size of the training dataset.
arXiv Detail & Related papers (2023-02-24T10:32:30Z) - SketchySGD: Reliable Stochastic Optimization via Randomized Curvature
Estimates [19.420605210427635]
SketchySGD improves upon existing gradient methods in machine learning by using randomized low-rank approximations to the subsampled Hessian.
We show theoretically that SketchySGD with a fixed stepsize converges linearly to a small ball around the optimum.
In the ill-conditioned setting we show SketchySGD converges at a faster rate than SGD for least-squares problems.
arXiv Detail & Related papers (2022-11-16T01:05:41Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - Experimental Design for Linear Functionals in Reproducing Kernel Hilbert
Spaces [102.08678737900541]
We provide algorithms for constructing bias-aware designs for linear functionals.
We derive non-asymptotic confidence sets for fixed and adaptive designs under sub-Gaussian noise.
arXiv Detail & Related papers (2022-05-26T20:56:25Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Fast and Robust Online Inference with Stochastic Gradient Descent via
Random Scaling [0.9806910643086042]
We develop a new method of online inference for a vector of parameters estimated by the Polyak-Rtupper averaging procedure of gradient descent algorithms.
Our approach is fully operational with online data and is rigorously underpinned by a functional central limit theorem.
arXiv Detail & Related papers (2021-06-06T15:38:37Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for
Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features.
We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z) - Online Covariance Matrix Estimation in Stochastic Gradient Descent [10.153224593032677]
gradient descent (SGD) is widely used for parameter estimation especially for huge data sets and online learning.
This paper aims at quantifying statistical inference of SGD-based estimates in an online setting.
arXiv Detail & Related papers (2020-02-10T17:46:10Z) - HiGrad: Uncertainty Quantification for Online Learning and Stochastic Approximation [27.77529637229548]
gradient descent (SGD) is an immensely popular approach for online learning in settings where data arrives in a stream or data sizes are very large.
This paper introduces a novel procedure termed HiGrad to conduct statistical inference for online learning.
An R package texttthigrad has been developed to implement the method.
arXiv Detail & Related papers (2018-02-13T22:15:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.