On the Double Descent of Random Features Models Trained with SGD
- URL: http://arxiv.org/abs/2110.06910v1
- Date: Wed, 13 Oct 2021 17:47:39 GMT
- Title: On the Double Descent of Random Features Models Trained with SGD
- Authors: Fanghui Liu, Johan A.K. Suykens, Volkan Cevher
- Abstract summary: We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
- Score: 78.0918823643911
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study generalization properties of random features (RF) regression in high
dimensions optimized by stochastic gradient descent (SGD). In this regime, we
derive precise non-asymptotic error bounds of RF regression under both constant
and adaptive step-size SGD setting, and observe the double descent phenomenon
both theoretically and empirically. Our analysis shows how to cope with
multiple randomness sources of initialization, label noise, and data sampling
(as well as stochastic gradients) with no closed-form solution, and also goes
beyond the commonly-used Gaussian/spherical data assumption. Our theoretical
results demonstrate that, with SGD training, RF regression still generalizes
well for interpolation learning, and is able to characterize the double descent
behavior by the unimodality of variance and monotonic decrease of bias.
Besides, we also prove that the constant step-size SGD setting incurs no loss
in convergence rate when compared to the exact minimal-norm interpolator, as a
theoretical justification of using SGD in practice.
Related papers
- Effect of Random Learning Rate: Theoretical Analysis of SGD Dynamics in Non-Convex Optimization via Stationary Distribution [6.144680854063938]
We consider a variant of the gradient descent (SGD) with a random learning rate to reveal its convergence properties.
We demonstrate that a distribution of a parameter updated by Poisson SGD converges to a stationary distribution under weak assumptions.
arXiv Detail & Related papers (2024-06-23T06:52:33Z) - High-dimensional robust regression under heavy-tailed data: Asymptotics and Universality [7.416689632227865]
We investigate the high-dimensional properties of robust regression estimators in the presence of heavy-tailed noise.
We show that, despite being consistent, the Huber loss with optimally tuned location parameter $delta$ is suboptimal in the high-dimensional regime.
We derive the decay rates for the excess risk of ridge regression.
arXiv Detail & Related papers (2023-09-28T14:39:50Z) - Max-affine regression via first-order methods [7.12511675782289]
The max-affine model ubiquitously arises in applications in signal processing and statistics.
We present a non-asymptotic convergence analysis of gradient descent (GD) and mini-batch gradient descent (SGD) for max-affine regression.
arXiv Detail & Related papers (2023-08-15T23:46:44Z) - Doubly Stochastic Models: Learning with Unbiased Label Noises and
Inference Stability [85.1044381834036]
We investigate the implicit regularization effects of label noises under mini-batch sampling settings of gradient descent.
We find such implicit regularizer would favor some convergence points that could stabilize model outputs against perturbation of parameters.
Our work doesn't assume SGD as an Ornstein-Uhlenbeck like process and achieve a more general result with convergence of approximation proved.
arXiv Detail & Related papers (2023-04-01T14:09:07Z) - Double Descent in Random Feature Models: Precise Asymptotic Analysis for
General Convex Regularization [4.8900735721275055]
We provide precise expressions for the generalization of regression under a broad class of convex regularization terms.
We numerically demonstrate the predictive capacity of our framework, and show experimentally that the predicted test error is accurate even in the non-asymptotic regime.
arXiv Detail & Related papers (2022-04-06T08:59:38Z) - Stochastic Training is Not Necessary for Generalization [57.04880404584737]
It is widely believed that the implicit regularization of gradient descent (SGD) is fundamental to the impressive generalization behavior we observe in neural networks.
In this work, we demonstrate that non-stochastic full-batch training can achieve strong performance on CIFAR-10 that is on-par with SGD.
arXiv Detail & Related papers (2021-09-29T00:50:00Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - On the Generalization of Stochastic Gradient Descent with Momentum [58.900860437254885]
We first show that there exists a convex loss function for which algorithmic stability fails to establish generalization guarantees.
For smooth Lipschitz loss functions, we analyze a modified momentum-based update rule, and show that it admits an upper-bound on the generalization error.
For the special case of strongly convex loss functions, we find a range of momentum such that multiple epochs of standard SGDM, as a special form of SGDEM, also generalizes.
arXiv Detail & Related papers (2021-02-26T18:58:29Z) - When Does Preconditioning Help or Hurt Generalization? [74.25170084614098]
We show how the textitimplicit bias of first and second order methods affects the comparison of generalization properties.
We discuss several approaches to manage the bias-variance tradeoff, and the potential benefit of interpolating between GD and NGD.
arXiv Detail & Related papers (2020-06-18T17:57:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.