Related papers: Uniform Generalization Bounds for Overparameterized Neural Networks

Uniform Generalization Bounds for Overparameterized Neural Networks

URL: http://arxiv.org/abs/2109.06099v1
Date: Mon, 13 Sep 2021 16:20:13 GMT
Title: Uniform Generalization Bounds for Overparameterized Neural Networks
Authors: Sattar Vakili, Michael Bromberg, Da-shan Shiu, Alberto Bernacchia
Abstract summary: We prove uniform generalization bounds for overparameterized neural networks in kernel regimes. Our bounds capture the exact error rates depending on the differentiability of the activation functions. We show the equivalence between the RKHS corresponding to the NT kernel and its counterpart corresponding to the Mat'ern family of kernels.
Score: 5.945320097465419
License: http://creativecommons.org/licenses/by/4.0/
Abstract: An interesting observation in artificial neural networks is their favorable generalization error despite typically being extremely overparameterized. It is well known that classical statistical learning methods often result in vacuous generalization errors in the case of overparameterized neural networks. Adopting the recently developed Neural Tangent (NT) kernel theory, we prove uniform generalization bounds for overparameterized neural networks in kernel regimes, when the true data generating model belongs to the reproducing kernel Hilbert space (RKHS) corresponding to the NT kernel. Importantly, our bounds capture the exact error rates depending on the differentiability of the activation functions. In order to establish these bounds, we propose the information gain of the NT kernel as a measure of complexity of the learning problem. Our analysis uses a Mercer decomposition of the NT kernel in the basis of spherical harmonics and the decay rate of the corresponding eigenvalues. As a byproduct of our results, we show the equivalence between the RKHS corresponding to the NT kernel and its counterpart corresponding to the Mat\'ern family of kernels, that induces a very general class of models. We further discuss the implications of our analysis for some recent results on the regret bounds for reinforcement learning algorithms, which use overparameterized neural networks.

Related papers

Towards a Statistical Understanding of Neural Networks: Beyond the Neural Tangent Kernel Theories [13.949362600389088]
A primary advantage of neural networks lies in their feature learning characteristics. We propose a new paradigm for studying feature learning and the resulting benefits in generalizability.
arXiv Detail & Related papers (2024-12-25T03:03:58Z)
Novel Kernel Models and Exact Representor Theory for Neural Networks Beyond the Over-Parameterized Regime [52.00917519626559]
This paper presents two models of neural-networks and their training applicable to neural networks of arbitrary width, depth and topology. We also present an exact novel representor theory for layer-wise neural network training with unregularized gradient descent in terms of a local-extrinsic neural kernel (LeNK) This representor theory gives insight into the role of higher-order statistics in neural network training and the effect of kernel evolution in neural-network kernel models.
arXiv Detail & Related papers (2024-05-24T06:30:36Z)
Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification. Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z)
On the Eigenvalue Decay Rates of a Class of Neural-Network Related Kernel Functions Defined on General Domains [10.360517127652185]
We provide a strategy to determine the eigenvalue decay rate (EDR) of a large class of kernel functions defined on a general domain. This class of kernel functions include but are not limited to the neural tangent kernel associated with neural networks with different depths and various activation functions.
arXiv Detail & Related papers (2023-05-04T08:54:40Z)
Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights. We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z)
Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime. We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z)
Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a Polynomial Net Study [55.12108376616355]
The study on NTK has been devoted to typical neural network architectures, but is incomplete for neural networks with Hadamard products (NNs-Hp) In this work, we derive the finite-width-K formulation for a special class of NNs-Hp, i.e., neural networks. We prove their equivalence to the kernel regression predictor with the associated NTK, which expands the application scope of NTK.
arXiv Detail & Related papers (2022-09-16T06:36:06Z)
Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime [50.510421854168065]
We show that the averaged gradient descent can achieve the minimax optimal convergence rate. We show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate.
arXiv Detail & Related papers (2020-06-22T14:31:37Z)
Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks [17.188280334580195]
We derive analytical expressions for the generalization performance of kernel regression as a function of the number of training samples. Our expressions apply to wide neural networks due to an equivalence between training them and kernel regression with the Neural Kernel Tangent (NTK) We verify our theory with simulations on synthetic data and MNIST dataset.
arXiv Detail & Related papers (2020-02-07T00:03:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.