Neural networks: deep, shallow, or in between?
- URL: http://arxiv.org/abs/2310.07190v1
- Date: Wed, 11 Oct 2023 04:50:28 GMT
- Title: Neural networks: deep, shallow, or in between?
- Authors: Guergana Petrova and Przemyslaw Wojtaszczyk
- Abstract summary: We give estimates for the error of approximation of a compact subset from a Banach space by the outputs of feed-forward neural networks with width W, depth l and Lipschitz activation functions.
We show that, modulo logarithmic factors, rates better that entropy numbers' rates are possibly attainable only for neural networks for which the depth l goes to infinity.
- Score: 0.6043356028687779
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We give estimates from below for the error of approximation of a compact
subset from a Banach space by the outputs of feed-forward neural networks with
width W, depth l and Lipschitz activation functions. We show that, modulo
logarithmic factors, rates better that entropy numbers' rates are possibly
attainable only for neural networks for which the depth l goes to infinity, and
that there is no gain if we fix the depth and let the width W go to infinity.
Related papers
- On the Neural Tangent Kernel of Equilibrium Models [72.29727250679477]
This work studies the neural tangent kernel (NTK) of the deep equilibrium (DEQ) model.
We show that contrarily a DEQ model still enjoys a deterministic NTK despite its width and depth going to infinity at the same time under mild conditions.
arXiv Detail & Related papers (2023-10-21T16:47:18Z) - How Many Neurons Does it Take to Approximate the Maximum? [10.995895410470279]
We study the size of a neural network needed to approximate the maximum function over $d$ inputs.
We provide new lower and upper bounds on the width required for approximation across various depths.
arXiv Detail & Related papers (2023-07-18T12:47:35Z) - Width and Depth Limits Commute in Residual Networks [26.97391529844503]
We show that taking the width and depth to infinity in a deep neural network with skip connections, results in the same covariance structure no matter how that limit is taken.
This explains why the standard infinite-width-then-depth approach provides practical insights even for networks with depth of the same order as width.
We conduct extensive simulations that show an excellent match with our theoretical findings.
arXiv Detail & Related papers (2023-02-01T13:57:32Z) - Bayesian Interpolation with Deep Linear Networks [92.1721532941863]
Characterizing how neural network depth, width, and dataset size jointly impact model quality is a central problem in deep learning theory.
We show that linear networks make provably optimal predictions at infinite depth.
We also show that with data-agnostic priors, Bayesian model evidence in wide linear networks is maximized at infinite depth.
arXiv Detail & Related papers (2022-12-29T20:57:46Z) - The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich
Regimes [75.59720049837459]
We study the transition from infinite-width behavior to this variance limited regime as a function of sample size $P$ and network width $N$.
We find that finite-size effects can become relevant for very small datasets on the order of $P* sim sqrtN$ for regression with ReLU networks.
arXiv Detail & Related papers (2022-12-23T04:48:04Z) - The Limitations of Large Width in Neural Networks: A Deep Gaussian
Process Perspective [34.67386186205545]
This paper decouples capacity and width via the generalization of neural networks to Deep Gaussian Processes (Deep GP)
Surprisingly, we prove that even nonparametric Deep GP converges to Gaussian processes, effectively becoming shallower without any increase in representational power.
We find there is a "sweet spot" that maximizes test set performance before the limiting GP behavior prevents adaptability, occurring at width = 1 or width = 2 for nonparametric Deep GP.
arXiv Detail & Related papers (2021-06-11T17:58:58Z) - Size and Depth Separation in Approximating Natural Functions with Neural
Networks [52.73592689730044]
We show the benefits of size and depth for approximation of natural functions with ReLU networks.
We show a complexity-theoretic barrier to proving such results beyond size $O(d)$.
We also show an explicit natural function, that can be approximated with networks of size $O(d)$.
arXiv Detail & Related papers (2021-01-30T21:30:11Z) - Bayesian Deep Ensembles via the Neural Tangent Kernel [49.569912265882124]
We explore the link between deep ensembles and Gaussian processes (GPs) through the lens of the Neural Tangent Kernel (NTK)
We introduce a simple modification to standard deep ensembles training, through addition of a computationally-tractable, randomised and untrainable function to each ensemble member.
We prove that our Bayesian deep ensembles make more conservative predictions than standard deep ensembles in the infinite width limit.
arXiv Detail & Related papers (2020-07-11T22:10:52Z) - Approximation in shift-invariant spaces with deep ReLU neural networks [7.7084107194202875]
We study the expressive power of deep ReLU neural networks for approximating functions in dilated shift-invariant spaces.
Approximation error bounds are estimated with respect to the width and depth of neural networks.
arXiv Detail & Related papers (2020-05-25T07:23:47Z) - On Random Kernels of Residual Architectures [93.94469470368988]
We derive finite width and depth corrections for the Neural Tangent Kernel (NTK) of ResNets and DenseNets.
Our findings show that in ResNets, convergence to the NTK may occur when depth and width simultaneously tend to infinity.
In DenseNets, however, convergence of the NTK to its limit as the width tends to infinity is guaranteed.
arXiv Detail & Related papers (2020-01-28T16:47:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.