The Rate of Convergence of Variation-Constrained Deep Neural Networks
- URL: http://arxiv.org/abs/2106.12068v1
- Date: Tue, 22 Jun 2021 21:28:00 GMT
- Title: The Rate of Convergence of Variation-Constrained Deep Neural Networks
- Authors: Gen Li, Yuantao Gu, Jie Ding
- Abstract summary: We show that a class of variation-constrained neural networks can achieve near-parametric rate $n-1/2+delta$ for an arbitrarily small constant $delta$.
The result indicates that the neural function space needed for approximating smooth functions may not be as large as what is often perceived.
- Score: 35.393855471751756
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-layer feedforward networks have been used to approximate a wide range
of nonlinear functions. An important and fundamental problem is to understand
the learnability of a network model through its statistical risk, or the
expected prediction error on future data. To the best of our knowledge, the
rate of convergence of neural networks shown by existing works is bounded by at
most the order of $n^{-1/4}$ for a sample size of $n$. In this paper, we show
that a class of variation-constrained neural networks, with arbitrary width,
can achieve near-parametric rate $n^{-1/2+\delta}$ for an arbitrarily small
positive constant $\delta$. It is equivalent to $n^{-1 +2\delta}$ under the
mean squared error. This rate is also observed by numerical experiments. The
result indicates that the neural function space needed for approximating smooth
functions may not be as large as what is often perceived. Our result also
provides insight to the phenomena that deep neural networks do not easily
suffer from overfitting when the number of neurons and learning parameters
rapidly grow with $n$ or even surpass $n$. We also discuss the rate of
convergence regarding other network parameters, including the input dimension,
network layer, and coefficient norm.
Related papers
- Bayesian Inference with Deep Weakly Nonlinear Networks [57.95116787699412]
We show at a physics level of rigor that Bayesian inference with a fully connected neural network is solvable.
We provide techniques to compute the model evidence and posterior to arbitrary order in $1/N$ and at arbitrary temperature.
arXiv Detail & Related papers (2024-05-26T17:08:04Z) - Nonparametric regression using over-parameterized shallow ReLU neural networks [10.339057554827392]
We show that neural networks can achieve minimax optimal rates of convergence (up to logarithmic factors) for learning functions from certain smooth function classes.
It is assumed that the regression function is from the H"older space with smoothness $alpha(d+3)/2$ or a variation space corresponding to shallow neural networks.
As a byproduct, we derive a new size-independent bound for the local Rademacher complexity of shallow ReLU neural networks.
arXiv Detail & Related papers (2023-06-14T07:42:37Z) - The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich
Regimes [75.59720049837459]
We study the transition from infinite-width behavior to this variance limited regime as a function of sample size $P$ and network width $N$.
We find that finite-size effects can become relevant for very small datasets on the order of $P* sim sqrtN$ for regression with ReLU networks.
arXiv Detail & Related papers (2022-12-23T04:48:04Z) - Normalization effects on deep neural networks [20.48472873675696]
We study the effect of the choice of the $gamma_i$ on the statistical behavior of the neural network's output.
We find that in terms of variance of the neural network's output and test accuracy the best choice is to choose the $gamma_i$s to be equal to one.
arXiv Detail & Related papers (2022-09-02T17:05:55Z) - Correlation Functions in Random Fully Connected Neural Networks at
Finite Width [17.51364577113718]
This article considers fully connected neural networks with Gaussian random weights and biases and $L$ hidden layers.
For bounded non-linearities we give sharp recursion estimates in powers of $1/n$ for the joint correlation functions of the network output and its derivatives.
We find in both cases that the depth-to-width ratio $L/n$ plays the role of an effective network depth, controlling both the scale of fluctuations at individual neurons and the size of inter-neuron correlations.
arXiv Detail & Related papers (2022-04-03T11:57:18Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Fundamental tradeoffs between memorization and robustness in random
features and neural tangent regimes [15.76663241036412]
We prove for a large class of activation functions that, if the model memorizes even a fraction of the training, then its Sobolev-seminorm is lower-bounded.
Experiments reveal for the first time, (iv) a multiple-descent phenomenon in the robustness of the min-norm interpolator.
arXiv Detail & Related papers (2021-06-04T17:52:50Z) - Towards Understanding Hierarchical Learning: Benefits of Neural
Representations [160.33479656108926]
In this work, we demonstrate that intermediate neural representations add more flexibility to neural networks.
We show that neural representation can achieve improved sample complexities compared with the raw input.
Our results characterize when neural representations are beneficial, and may provide a new perspective on why depth is important in deep learning.
arXiv Detail & Related papers (2020-06-24T02:44:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.