Asymptotics of Stochastic Gradient Descent with Dropout Regularization in Linear Models
- URL: http://arxiv.org/abs/2409.07434v1
- Date: Wed, 11 Sep 2024 17:28:38 GMT
- Title: Asymptotics of Stochastic Gradient Descent with Dropout Regularization in Linear Models
- Authors: Jiaqi Li, Johannes Schmidt-Hieber, Wei Biao Wu,
- Abstract summary: This paper proposes a theory for online inference of the gradient descent (SGD) iterates with dropout regularization in linear regression.
For sufficiently large samples, the proposed confidence intervals for ASGD with dropout nearly achieve the nominal coverage probability.
- Score: 8.555650549124818
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes an asymptotic theory for online inference of the stochastic gradient descent (SGD) iterates with dropout regularization in linear regression. Specifically, we establish the geometric-moment contraction (GMC) for constant step-size SGD dropout iterates to show the existence of a unique stationary distribution of the dropout recursive function. By the GMC property, we provide quenched central limit theorems (CLT) for the difference between dropout and $\ell^2$-regularized iterates, regardless of initialization. The CLT for the difference between the Ruppert-Polyak averaged SGD (ASGD) with dropout and $\ell^2$-regularized iterates is also presented. Based on these asymptotic normality results, we further introduce an online estimator for the long-run covariance matrix of ASGD dropout to facilitate inference in a recursive manner with efficiency in computational time and memory. The numerical experiments demonstrate that for sufficiently large samples, the proposed confidence intervals for ASGD with dropout nearly achieve the nominal coverage probability.
Related papers
- Improving the Convergence Rates of Forward Gradient Descent with Repeated Sampling [5.448070998907116]
Forward gradient descent (FGD) has been proposed as a biologically more plausible alternative of gradient descent.
In this paper we show that by computing $ell$ FGD steps based on each training sample, this suboptimality factor becomes $d/(ell wedge d)$.
We also show that FGD with repeated sampling can adapt to low-dimensional structure in the input distribution.
arXiv Detail & Related papers (2024-11-26T16:28:16Z) - Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson-Romberg Extrapolation [22.652143194356864]
We address the problem of solving strongly convex and smooth problems using a descent gradient (SGD) algorithm with a constant step size.
We provide an expansion of the mean-squared error of the resulting estimator with respect to the number iterations of $n$.
We establish that this chain is geometrically ergodic with respect to a defined weighted Wasserstein semimetric.
arXiv Detail & Related papers (2024-10-07T15:02:48Z) - Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution.
We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z) - Implicit Bias of Gradient Descent for Logistic Regression at the Edge of
Stability [69.01076284478151]
In machine learning optimization, gradient descent (GD) often operates at the edge of stability (EoS)
This paper studies the convergence and implicit bias of constant-stepsize GD for logistic regression on linearly separable data in the EoS regime.
arXiv Detail & Related papers (2023-05-19T16:24:47Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms.
We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients.
We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z) - Online Covariance Matrix Estimation in Stochastic Gradient Descent [10.153224593032677]
gradient descent (SGD) is widely used for parameter estimation especially for huge data sets and online learning.
This paper aims at quantifying statistical inference of SGD-based estimates in an online setting.
arXiv Detail & Related papers (2020-02-10T17:46:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.