Architecture independent generalization bounds for overparametrized deep ReLU networks
- URL: http://arxiv.org/abs/2504.05695v2
- Date: Wed, 09 Apr 2025 17:29:05 GMT
- Title: Architecture independent generalization bounds for overparametrized deep ReLU networks
- Authors: Thomas Chen, Chun-Kai Kevin Chien, Patricia Muñoz Ewald, Andrew G. Moore,
- Abstract summary: We prove that overparametrized neural networks are able to generalize with a test error independent of the level of overparametrization.<n>For overparametrized deep ReLU networks with a training sample size bounded by the input space dimension, we explicitly construct zero loss minimizers without use of gradient descent.
- Score: 0.9687141267566189
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We prove that overparametrized neural networks are able to generalize with a test error that is independent of the level of overparametrization, and independent of the Vapnik-Chervonenkis (VC) dimension. We prove explicit bounds that only depend on the metric geometry of the test and training sets, on the regularity properties of the activation function, and on the operator norms of the weights and norms of biases. For overparametrized deep ReLU networks with a training sample size bounded by the input space dimension, we explicitly construct zero loss minimizers without use of gradient descent, and prove that the generalization error is independent of the network architecture.
Related papers
- Generalization of Scaled Deep ResNets in the Mean-Field Regime [55.77054255101667]
We investigate emphscaled ResNet in the limit of infinitely deep and wide neural networks.
Our results offer new insights into the generalization ability of deep ResNet beyond the lazy training regime.
arXiv Detail & Related papers (2024-03-14T21:48:00Z) - On Size-Independent Sample Complexity of ReLU Networks [9.15749739027059]
We study the sample complexity of learning ReLU neural networks from the point of view of generalization.
We estimate the Rademacher complexity of the associated function class.
arXiv Detail & Related papers (2023-06-03T03:41:33Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - A Lifted Bregman Formulation for the Inversion of Deep Neural Networks [28.03724379169264]
We propose a novel framework for the regularised inversion of deep neural networks.
The framework lifts the parameter space into a higher dimensional space by introducing auxiliary variables.
We present theoretical results and support their practical application with numerical examples.
arXiv Detail & Related papers (2023-03-01T20:30:22Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - The Sample Complexity of One-Hidden-Layer Neural Networks [57.6421258363243]
We study a class of scalar-valued one-hidden-layer networks, and inputs bounded in Euclidean norm.
We prove that controlling the spectral norm of the hidden layer weight matrix is insufficient to get uniform convergence guarantees.
We analyze two important settings where a mere spectral norm control turns out to be sufficient.
arXiv Detail & Related papers (2022-02-13T07:12:02Z) - Global convergence of ResNets: From finite to infinite width using
linear parameterization [0.0]
We study Residual Networks (ResNets) in which the residual block has linear parametrization while still being nonlinear.
In this limit, we prove a local Polyak-Lojasiewicz inequality, retrieving the lazy regime.
Our analysis leads to a practical and quantified recipe.
arXiv Detail & Related papers (2021-12-10T13:38:08Z) - Lower Bounds on the Generalization Error of Nonlinear Learning Models [2.1030878979833467]
We study in this paper lower bounds for the generalization error of models derived from multi-layer neural networks, in the regime where the size of the layers is commensurate with the number of samples in the training data.
We show that unbiased estimators have unacceptable performance for such nonlinear networks in this regime.
We derive explicit generalization lower bounds for general biased estimators, in the cases of linear regression and of two-layered networks.
arXiv Detail & Related papers (2021-03-26T20:37:54Z) - Robustness to Pruning Predicts Generalization in Deep Neural Networks [29.660568281957072]
We introduce prunability: the smallest emphfraction of a network's parameters that can be kept while pruning without adversely affecting its training loss.
We show that this measure is highly predictive of a model's generalization performance across a large set of convolutional networks trained on CIFAR-10.
arXiv Detail & Related papers (2021-03-10T11:39:14Z) - Dimension Free Generalization Bounds for Non Linear Metric Learning [61.193693608166114]
We provide uniform generalization bounds for two regimes -- the sparse regime, and a non-sparse regime.
We show that by relying on a different, new property of the solutions, it is still possible to provide dimension free generalization guarantees.
arXiv Detail & Related papers (2021-02-07T14:47:00Z) - Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective.
We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.