Generalization Ability of Wide Neural Networks on $\mathbb{R}$
- URL: http://arxiv.org/abs/2302.05933v1
- Date: Sun, 12 Feb 2023 15:07:27 GMT
- Title: Generalization Ability of Wide Neural Networks on $\mathbb{R}$
- Authors: Jianfa Lai, Manyun Xu, Rui Chen and Qian Lin
- Abstract summary: We study the generalization ability of the wide two-layer ReLU neural network on $mathbbR$.
We show that: $i)$ when the width $mrightarrowinfty$, the neural network kernel (NNK) uniformly converges to the NTK; $ii)$ the minimax rate of regression over the RKHS associated to $K_1$ is $n-2/3$; $iii)$ if one adopts the early stopping strategy in training a wide neural network, the resulting neural network achieves the minimax rate; $iv
- Score: 8.508360765158326
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We perform a study on the generalization ability of the wide two-layer ReLU
neural network on $\mathbb{R}$. We first establish some spectral properties of
the neural tangent kernel (NTK): $a)$ $K_{d}$, the NTK defined on
$\mathbb{R}^{d}$, is positive definite; $b)$ $\lambda_{i}(K_{1})$, the $i$-th
largest eigenvalue of $K_{1}$, is proportional to $i^{-2}$. We then show that:
$i)$ when the width $m\rightarrow\infty$, the neural network kernel (NNK)
uniformly converges to the NTK; $ii)$ the minimax rate of regression over the
RKHS associated to $K_{1}$ is $n^{-2/3}$; $iii)$ if one adopts the early
stopping strategy in training a wide neural network, the resulting neural
network achieves the minimax rate; $iv)$ if one trains the neural network till
it overfits the data, the resulting neural network can not generalize well.
Finally, we provide an explanation to reconcile our theory and the widely
observed ``benign overfitting phenomenon''.
Related papers
- On the Impacts of the Random Initialization in the Neural Tangent Kernel Theory [10.360517127652185]
It is well known that as the network's width tends to infinity, the neural network with random initialization converges to a Gaussian process $fmathrmGP$.
To adopt the traditional theory of kernel regression, most recent works introduced a special mirrored architecture to ensure the network's output is identically zero at inception.
arXiv Detail & Related papers (2024-10-08T02:22:50Z) - Deep Neural Networks: Multi-Classification and Universal Approximation [0.0]
We demonstrate that a ReLU deep neural network with a width of $2$ and a depth of $2N+4M-1$ layers can achieve finite sample memorization for any dataset comprising $N$ elements.
We also provide depth estimates for approximating $W1,p$ functions and width estimates for approximating $Lp(Omega;mathbbRm)$ for $mgeq1$.
arXiv Detail & Related papers (2024-09-10T14:31:21Z) - Bayesian Inference with Deep Weakly Nonlinear Networks [57.95116787699412]
We show at a physics level of rigor that Bayesian inference with a fully connected neural network is solvable.
We provide techniques to compute the model evidence and posterior to arbitrary order in $1/N$ and at arbitrary temperature.
arXiv Detail & Related papers (2024-05-26T17:08:04Z) - Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
We study the problem of learning hierarchical functions over the standard Gaussian distribution with three-layer neural networks.
For a large subclass of degree $k$s $p$, a three-layer neural network trained via layerwise gradientp descent on the square loss learns the target $h$ up to vanishing test error.
This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
arXiv Detail & Related papers (2023-11-23T02:19:32Z) - Rates of Approximation by ReLU Shallow Neural Networks [8.22379888383833]
We show that ReLU shallow neural networks with $m$ hidden neurons can uniformly approximate functions from the H"older space.
Such rates are very close to the optimal one $O(m-fracrd)$ in the sense that $fracd+2d+4d+4$ is close to $1$, when the dimension $d$ is large.
arXiv Detail & Related papers (2023-07-24T00:16:50Z) - The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich
Regimes [75.59720049837459]
We study the transition from infinite-width behavior to this variance limited regime as a function of sample size $P$ and network width $N$.
We find that finite-size effects can become relevant for very small datasets on the order of $P* sim sqrtN$ for regression with ReLU networks.
arXiv Detail & Related papers (2022-12-23T04:48:04Z) - Neural Networks Efficiently Learn Low-Dimensional Representations with
SGD [22.703825902761405]
We show that SGD-trained ReLU NNs can learn a single-index target of the form $y=f(langleboldsymbolu,boldsymbolxrangle) + epsilon$ by recovering the principal direction.
We also provide compress guarantees for NNs using the approximate low-rank structure produced by SGD.
arXiv Detail & Related papers (2022-09-29T15:29:10Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK [58.5766737343951]
We consider the dynamic of descent for learning a two-layer neural network.
We show that an over-parametrized two-layer neural network can provably learn with gradient loss at most ground with Tangent samples.
arXiv Detail & Related papers (2020-07-09T07:09:28Z) - Towards Understanding Hierarchical Learning: Benefits of Neural
Representations [160.33479656108926]
In this work, we demonstrate that intermediate neural representations add more flexibility to neural networks.
We show that neural representation can achieve improved sample complexities compared with the raw input.
Our results characterize when neural representations are beneficial, and may provide a new perspective on why depth is important in deep learning.
arXiv Detail & Related papers (2020-06-24T02:44:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.