Can Implicit Bias Explain Generalization? Stochastic Convex Optimization
as a Case Study
- URL: http://arxiv.org/abs/2003.06152v3
- Date: Tue, 22 Dec 2020 08:16:38 GMT
- Title: Can Implicit Bias Explain Generalization? Stochastic Convex Optimization
as a Case Study
- Authors: Assaf Dauber and Meir Feder and Tomer Koren and Roi Livni
- Abstract summary: implicit regularization refers to the tendency of the optimization generalization towards a certain structured solution that often generalizes well.
We provide a simple construction that rules out the existence of a emphdistribution-independent implicit regularizer that governs the generalization ability of Gradient Descent (SGD)
We then demonstrate a learning problem that rules out a very general class of emphdistribution-dependent implicit regularizers from explaining generalization, which includes strongly convex regularizers as well as non-degenerate norm-based regularizations.
- Score: 43.586269524826854
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The notion of implicit bias, or implicit regularization, has been suggested
as a means to explain the surprising generalization ability of modern-days
overparameterized learning algorithms. This notion refers to the tendency of
the optimization algorithm towards a certain structured solution that often
generalizes well. Recently, several papers have studied implicit regularization
and were able to identify this phenomenon in various scenarios. We revisit this
paradigm in arguably the simplest non-trivial setup, and study the implicit
bias of Stochastic Gradient Descent (SGD) in the context of Stochastic Convex
Optimization. As a first step, we provide a simple construction that rules out
the existence of a \emph{distribution-independent} implicit regularizer that
governs the generalization ability of SGD. We then demonstrate a learning
problem that rules out a very general class of \emph{distribution-dependent}
implicit regularizers from explaining generalization, which includes strongly
convex regularizers as well as non-degenerate norm-based regularizations.
Certain aspects of our constructions point out to significant difficulties in
providing a comprehensive explanation of an algorithm's generalization
performance by solely arguing about its implicit regularization properties.
Related papers
- A Unified Approach to Controlling Implicit Regularization via Mirror
Descent [18.536453909759544]
Mirror descent (MD) is a notable generalization of gradient descent (GD)
We show that MD can be implemented efficiently and enjoys fast convergence under suitable conditions.
arXiv Detail & Related papers (2023-06-24T03:57:26Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - Iterative regularization for low complexity regularizers [18.87017835436693]
Iterative regularization exploits the implicit bias of an optimization algorithm to regularize ill-posed problems.
We propose and study the first iterative regularization procedure able to handle biases described by non smooth and non strongly convex functionals.
arXiv Detail & Related papers (2022-02-01T14:09:00Z) - Stochastic Training is Not Necessary for Generalization [57.04880404584737]
It is widely believed that the implicit regularization of gradient descent (SGD) is fundamental to the impressive generalization behavior we observe in neural networks.
In this work, we demonstrate that non-stochastic full-batch training can achieve strong performance on CIFAR-10 that is on-par with SGD.
arXiv Detail & Related papers (2021-09-29T00:50:00Z) - The Benefits of Implicit Regularization from SGD in Least Squares
Problems [116.85246178212616]
gradient descent (SGD) exhibits strong algorithmic regularization effects in practice.
We make comparisons of the implicit regularization afforded by (unregularized) average SGD with the explicit regularization of ridge regression.
arXiv Detail & Related papers (2021-08-10T09:56:47Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Dimension Free Generalization Bounds for Non Linear Metric Learning [61.193693608166114]
We provide uniform generalization bounds for two regimes -- the sparse regime, and a non-sparse regime.
We show that by relying on a different, new property of the solutions, it is still possible to provide dimension free generalization guarantees.
arXiv Detail & Related papers (2021-02-07T14:47:00Z) - Failures of model-dependent generalization bounds for least-norm
interpolation [39.97534972432276]
We consider bounds on the generalization performance of the least-norm linear regressor.
For a variety of natural joint distributions on training examples, any valid generalization bound must sometimes be very loose.
arXiv Detail & Related papers (2020-10-16T16:30:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.