Relative Flatness and Generalization
- URL: http://arxiv.org/abs/2001.00939v4
- Date: Thu, 4 Nov 2021 15:00:25 GMT
- Title: Relative Flatness and Generalization
- Authors: Henning Petzka, Michael Kamp, Linara Adilova, Cristian Sminchisescu,
Mario Boley
- Abstract summary: Flatness of the loss curve is conjectured to be connected to the generalization ability of machine learning models.
It is still an open theoretical problem why and under which circumstances flatness is connected to generalization.
- Score: 31.307340632319583
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Flatness of the loss curve is conjectured to be connected to the
generalization ability of machine learning models, in particular neural
networks. While it has been empirically observed that flatness measures
consistently correlate strongly with generalization, it is still an open
theoretical problem why and under which circumstances flatness is connected to
generalization, in particular in light of reparameterizations that change
certain flatness measures but leave generalization unchanged. We investigate
the connection between flatness and generalization by relating it to the
interpolation from representative data, deriving notions of representativeness,
and feature robustness. The notions allow us to rigorously connect flatness and
generalization and to identify conditions under which the connection holds.
Moreover, they give rise to a novel, but natural relative flatness measure that
correlates strongly with generalization, simplifies to ridge regression for
ordinary least squares, and solves the reparameterization issue.
Related papers
- Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation [59.138470433237615]
We introduce statistical metrics that quantify both the linguistic and visual skew of a dataset for relational learning.
We show that systematically controlled metrics are strongly predictive of generalization performance.
This work informs an important direction towards quality-enhancing the data diversity or balance to scaling up the absolute size.
arXiv Detail & Related papers (2024-03-25T03:18:39Z) - A U-turn on Double Descent: Rethinking Parameter Counting in Statistical
Learning [68.76846801719095]
We show that double descent appears exactly when and where it occurs, and that its location is not inherently tied to the threshold p=n.
This provides a resolution to tensions between double descent and statistical intuition.
arXiv Detail & Related papers (2023-10-29T12:05:39Z) - FAM: Relative Flatness Aware Minimization [5.132856559837775]
optimizing for flatness has been proposed as early as 1994 by Hochreiter and Schmidthuber.
Recent theoretical work suggests that a particular relative flatness measure can be connected to generalization.
We derive a regularizer based on this relative flatness that is easy to compute, fast, efficient, and works with arbitrary loss functions.
arXiv Detail & Related papers (2023-07-05T14:48:24Z) - The Inductive Bias of Flatness Regularization for Deep Matrix
Factorization [58.851514333119255]
This work takes the first step toward understanding the inductive bias of the minimum trace of the Hessian solutions in deep linear networks.
We show that for all depth greater than one, with the standard Isometry Property (RIP) on the measurements, minimizing the trace of Hessian is approximately equivalent to minimizing the Schatten 1-norm of the corresponding end-to-end matrix parameters.
arXiv Detail & Related papers (2023-06-22T23:14:57Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - Scale-invariant Bayesian Neural Networks with Connectivity Tangent
Kernel [30.088226334627375]
We show that flatness and generalization bounds can be changed arbitrarily according to the scale of a parameter.
We propose new prior and posterior distributions invariant to scaling transformations by textitdecomposing the scale and connectivity of parameters.
We empirically demonstrate our posterior provides effective flatness and calibration measures with low complexity.
arXiv Detail & Related papers (2022-09-30T03:31:13Z) - Why Flatness Correlates With Generalization For Deep Neural Networks [0.0]
We argue that local flatness measures correlate with generalization because they are local approximations to a global property.
For functions that give zero error on a test set, it is directly proportional to the Bayesian posterior.
Some variants of SGD can break the flatness-generalization correlation, while the volume-generalization correlation remains intact.
arXiv Detail & Related papers (2021-03-10T17:44:52Z) - Implicit Regularization in Tensor Factorization [17.424619189180675]
Implicit regularization in deep learning is perceived as a tendency of gradient-based optimization to fit training data with predictors of minimal "complexity"
We argue that tensor rank may pave way to explaining both implicit regularization of neural networks, and the properties of real-world data translating it to generalization.
arXiv Detail & Related papers (2021-02-19T15:10:26Z) - Dimension Free Generalization Bounds for Non Linear Metric Learning [61.193693608166114]
We provide uniform generalization bounds for two regimes -- the sparse regime, and a non-sparse regime.
We show that by relying on a different, new property of the solutions, it is still possible to provide dimension free generalization guarantees.
arXiv Detail & Related papers (2021-02-07T14:47:00Z) - Implicit Regularization in ReLU Networks with the Square Loss [56.70360094597169]
We show that it is impossible to characterize the implicit regularization with the square loss by any explicit function of the model parameters.
Our results suggest that a more general framework may be needed to understand implicit regularization for nonlinear predictors.
arXiv Detail & Related papers (2020-12-09T16:48:03Z) - Overparameterization and generalization error: weighted trigonometric
interpolation [4.631723879329972]
We study a random Fourier series model, where the task is to estimate the unknown Fourier coefficients from equidistant samples.
We show precisely how a bias towards smooth interpolants, in the form of weighted trigonometric generalization, can lead to smaller generalization error.
arXiv Detail & Related papers (2020-06-15T15:53:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.