A Near Complete Nonasymptotic Generalization Theory For Multilayer Neural Networks: Beyond the Bias-Variance Tradeoff
- URL: http://arxiv.org/abs/2503.02129v1
- Date: Mon, 03 Mar 2025 23:34:12 GMT
- Title: A Near Complete Nonasymptotic Generalization Theory For Multilayer Neural Networks: Beyond the Bias-Variance Tradeoff
- Authors: Hao Yu, Xiangyang Ji,
- Abstract summary: We propose a nonasymptotic generalization theory for multilayer neural networks with arbitrary Lipschitz activations and general Lipschitz loss functions.<n>In particular, it doens't require the boundness of loss function, as commonly assumed in the literature.<n>We show the near minimax optimality of our theory for multilayer ReLU networks for regression problems.
- Score: 57.25901375384457
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a first near complete (that will make explicit sense in the main text) nonasymptotic generalization theory for multilayer neural networks with arbitrary Lipschitz activations and general Lipschitz loss functions (with some very mild conditions). In particular, it doens't require the boundness of loss function, as commonly assumed in the literature. Our theory goes beyond the bias-variance tradeoff, aligned with phenomenon typically encountered in deep learning. It is therefore sharp different with other existing nonasymptotic generalization error bounds for neural networks. More explicitly, we propose an explicit generalization error upper bound for multilayer neural networks with arbitrary Lipschitz activations $\sigma$ with $\sigma(0)=0$ and broad enough Lipschitz loss functions, without requiring either the width, depth or other hyperparameters of the neural network approaching infinity, a specific neural network architect (e.g. sparsity, boundness of some norms), a particular activation function, a particular optimization algorithm or boundness of the loss function, and with taking the approximation error into consideration. General Lipschitz activation can also be accommodated into our framework. A feature of our theory is that it also considers approximation errors. Furthermore, we show the near minimax optimality of our theory for multilayer ReLU networks for regression problems. Notably, our upper bound exhibits the famous double descent phenomenon for such networks, which is the most distinguished characteristic compared with other existing results. This work emphasizes a view that many classical results should be improved to embrace the unintuitive characteristics of deep learning to get a better understanding of it.
Related papers
- Universal Consistency of Wide and Deep ReLU Neural Networks and Minimax
Optimal Convergence Rates for Kolmogorov-Donoho Optimal Function Classes [7.433327915285969]
We prove the universal consistency of wide and deep ReLU neural network classifiers trained on the logistic loss.
We also give sufficient conditions for a class of probability measures for which classifiers based on neural networks achieve minimax optimal rates of convergence.
arXiv Detail & Related papers (2024-01-08T23:54:46Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Improving Lipschitz-Constrained Neural Networks by Learning Activation
Functions [14.378778606939665]
Lipschitz-constrained neural networks have several advantages over unconstrained ones and can be applied to a variety of problems.
We show that neural networks with learnable 1-Lipschitz linear splines are known to be more expressive.
Our numerical experiments show that our trained networks compare favorably with existing 1-Lipschitz neural architectures.
arXiv Detail & Related papers (2022-10-28T15:56:55Z) - On the Omnipresence of Spurious Local Minima in Certain Neural Network
Training Problems [0.0]
We study the loss landscape of training problems for deep artificial neural networks with a one-dimensional real output.
It is shown that such problems possess a continuum of spurious (i.e., not globally optimal) local minima for all target functions that are not affine.
arXiv Detail & Related papers (2022-02-23T14:41:54Z) - Non-Vacuous Generalisation Bounds for Shallow Neural Networks [5.799808780731661]
We focus on a specific class of shallow neural networks with a single hidden layer.
We derive new generalisation bounds through the PAC-Bayesian theory.
Our bounds are empirically non-vacuous when the network is trained with vanilla gradient descent on MNIST and Fashion-MNIST.
arXiv Detail & Related papers (2022-02-03T14:59:51Z) - The Eigenlearning Framework: A Conservation Law Perspective on Kernel
Regression and Wide Neural Networks [1.6519302768772166]
We derive simple closed-form estimates for the test risk and other generalization metrics of kernel ridge regression.
We identify a sharp conservation law which limits the ability of KRR to learn any orthonormal basis of functions.
arXiv Detail & Related papers (2021-10-08T06:32:07Z) - Predicting Unreliable Predictions by Shattering a Neural Network [145.3823991041987]
Piecewise linear neural networks can be split into subfunctions.
Subfunctions have their own activation pattern, domain, and empirical error.
Empirical error for the full network can be written as an expectation over subfunctions.
arXiv Detail & Related papers (2021-06-15T18:34:41Z) - The Many Faces of 1-Lipschitz Neural Networks [1.911678487931003]
We show that 1-Lipschitz neural network can fit arbitrarily difficult frontier making them as expressive as classical ones.
We also study the link between classification with 1-Lipschitz network and optimal transport thanks to regularized versions of Kantorovich-Rubinstein duality theory.
arXiv Detail & Related papers (2021-04-11T20:31:32Z) - Lipschitz Bounded Equilibrium Networks [3.2872586139884623]
This paper introduces new parameterizations of equilibrium neural networks, i.e. networks defined by implicit equations.
The new parameterization admits a Lipschitz bound during training via unconstrained optimization.
In image classification experiments we show that the Lipschitz bounds are very accurate and improve robustness to adversarial attacks.
arXiv Detail & Related papers (2020-10-05T01:00:40Z) - Generalization bound of globally optimal non-convex neural network
training: Transportation map estimation by infinite dimensional Langevin
dynamics [50.83356836818667]
We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error.
Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence.
arXiv Detail & Related papers (2020-07-11T18:19:50Z) - On Lipschitz Regularization of Convolutional Layers using Toeplitz
Matrix Theory [77.18089185140767]
Lipschitz regularity is established as a key property of modern deep learning.
computing the exact value of the Lipschitz constant of a neural network is known to be NP-hard.
We introduce a new upper bound for convolutional layers that is both tight and easy to compute.
arXiv Detail & Related papers (2020-06-15T13:23:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.