Instance-Dependent Generalization Bounds via Optimal Transport
- URL: http://arxiv.org/abs/2211.01258v4
- Date: Mon, 13 Nov 2023 13:37:52 GMT
- Title: Instance-Dependent Generalization Bounds via Optimal Transport
- Authors: Songyan Hou, Parnian Kassraie, Anastasis Kratsios, Andreas Krause,
Jonas Rothfuss
- Abstract summary: Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
- Score: 51.71650746285469
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Existing generalization bounds fail to explain crucial factors that drive the
generalization of modern neural networks. Since such bounds often hold
uniformly over all parameters, they suffer from over-parametrization and fail
to account for the strong inductive bias of initialization and stochastic
gradient descent. As an alternative, we propose a novel optimal transport
interpretation of the generalization problem. This allows us to derive
instance-dependent generalization bounds that depend on the local Lipschitz
regularity of the learned prediction function in the data space. Therefore, our
bounds are agnostic to the parametrization of the model and work well when the
number of training samples is much smaller than the number of parameters. With
small modifications, our approach yields accelerated rates for data on
low-dimensional manifolds and guarantees under distribution shifts. We
empirically analyze our generalization bounds for neural networks, showing that
the bound values are meaningful and capture the effect of popular
regularization methods during training.
Related papers
- Exact, Tractable Gauss-Newton Optimization in Deep Reversible Architectures Reveal Poor Generalization [52.16435732772263]
Second-order optimization has been shown to accelerate the training of deep neural networks in many applications.
However, generalization properties of second-order methods are still being debated.
We show for the first time that exact Gauss-Newton (GN) updates take on a tractable form in a class of deep architectures.
arXiv Detail & Related papers (2024-11-12T17:58:40Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Generalization in Kernel Regression Under Realistic Assumptions [41.345620270267446]
We provide rigorous bounds for common kernels and for any amount of regularization, noise, any input dimension, and any number of samples.
Our results imply benign overfitting in high input dimensions, nearly tempered overfitting in fixed dimensions, and explicit convergence rates for regularized regression.
As a by-product, we obtain time-dependent bounds for neural networks trained in the kernel regime.
arXiv Detail & Related papers (2023-12-26T10:55:20Z) - Measuring Generalization with Optimal Transport [111.29415509046886]
We develop margin-based generalization bounds, where the margins are normalized with optimal transport costs.
Our bounds robustly predict the generalization error, given training data and network parameters, on large scale datasets.
arXiv Detail & Related papers (2021-06-07T03:04:59Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Failures of model-dependent generalization bounds for least-norm
interpolation [39.97534972432276]
We consider bounds on the generalization performance of the least-norm linear regressor.
For a variety of natural joint distributions on training examples, any valid generalization bound must sometimes be very loose.
arXiv Detail & Related papers (2020-10-16T16:30:05Z) - Benign overfitting in ridge regression [0.0]
We provide non-asymptotic generalization bounds for overparametrized ridge regression.
We identify when small or negative regularization is sufficient for obtaining small generalization error.
arXiv Detail & Related papers (2020-09-29T20:00:31Z) - When Does Preconditioning Help or Hurt Generalization? [74.25170084614098]
We show how the textitimplicit bias of first and second order methods affects the comparison of generalization properties.
We discuss several approaches to manage the bias-variance tradeoff, and the potential benefit of interpolating between GD and NGD.
arXiv Detail & Related papers (2020-06-18T17:57:26Z) - Fundamental Limits of Ridge-Regularized Empirical Risk Minimization in
High Dimensions [41.7567932118769]
Empirical Risk Minimization algorithms are widely used in a variety of estimation and prediction tasks.
In this paper, we characterize for the first time the fundamental limits on the statistical accuracy of convex ERM for inference.
arXiv Detail & Related papers (2020-06-16T04:27:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.