The Directional Bias Helps Stochastic Gradient Descent to Generalize in
Kernel Regression Models
- URL: http://arxiv.org/abs/2205.00061v1
- Date: Fri, 29 Apr 2022 19:44:01 GMT
- Title: The Directional Bias Helps Stochastic Gradient Descent to Generalize in
Kernel Regression Models
- Authors: Yiling Luo, Xiaoming Huo, Yajun Mei
- Abstract summary: We study the Gradient Descent (SGD) algorithm in nonparametric statistics: kernel regression.
The directional bias property of SGD, which is known in the linear regression setting, is generalized to the kernel regression.
- Score: 7.00422423634143
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the Stochastic Gradient Descent (SGD) algorithm in nonparametric
statistics: kernel regression in particular. The directional bias property of
SGD, which is known in the linear regression setting, is generalized to the
kernel regression. More specifically, we prove that SGD with moderate and
annealing step-size converges along the direction of the eigenvector that
corresponds to the largest eigenvalue of the Gram matrix. In addition, the
Gradient Descent (GD) with a moderate or small step-size converges along the
direction that corresponds to the smallest eigenvalue. These facts are referred
to as the directional bias properties; they may interpret how an SGD-computed
estimator has a potentially smaller generalization error than a GD-computed
estimator. The application of our theory is demonstrated by simulation studies
and a case study that is based on the FashionMNIST dataset.
Related papers
- High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization [83.06112052443233]
This paper studies kernel ridge regression in high dimensions under covariate shifts.
By a bias-variance decomposition, we theoretically demonstrate that the re-weighting strategy allows for decreasing the variance.
For bias, we analyze the regularization of the arbitrary or well-chosen scale, showing that the bias can behave very differently under different regularization scales.
arXiv Detail & Related papers (2024-06-05T12:03:27Z) - Risk Bounds of Accelerated SGD for Overparameterized Linear Regression [75.27846230182885]
Accelerated gradient descent (ASGD) is a workhorse in deep learning.
Existing optimization theory can only explain the faster convergence of ASGD, but cannot explain its better generalization.
arXiv Detail & Related papers (2023-11-23T23:02:10Z) - Spectrum-Aware Debiasing: A Modern Inference Framework with Applications to Principal Components Regression [1.342834401139078]
We introduce SpectrumAware Debiasing, a novel method for high-dimensional regression.
Our approach applies to problems with structured, heavy tails, and low-rank structures.
We demonstrate our method through simulated and real data experiments.
arXiv Detail & Related papers (2023-09-14T15:58:30Z) - Implicit Bias of Gradient Descent for Logistic Regression at the Edge of
Stability [69.01076284478151]
In machine learning optimization, gradient descent (GD) often operates at the edge of stability (EoS)
This paper studies the convergence and implicit bias of constant-stepsize GD for logistic regression on linearly separable data in the EoS regime.
arXiv Detail & Related papers (2023-05-19T16:24:47Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Direction Matters: On the Implicit Bias of Stochastic Gradient Descent
with Moderate Learning Rate [105.62979485062756]
This paper attempts to characterize the particular regularization effect of SGD in the moderate learning rate regime.
We show that SGD converges along the large eigenvalue directions of the data matrix, while GD goes after the small eigenvalue directions.
arXiv Detail & Related papers (2020-11-04T21:07:52Z) - The Heavy-Tail Phenomenon in SGD [7.366405857677226]
We show that depending on the structure of the Hessian of the loss at the minimum, the SGD iterates will converge to a emphheavy-tailed stationary distribution.
We translate our results into insights about the behavior of SGD in deep learning.
arXiv Detail & Related papers (2020-06-08T16:43:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.