Neural Networks with Small Weights and Depth-Separation Barriers
- URL: http://arxiv.org/abs/2006.00625v4
- Date: Sun, 27 Dec 2020 20:33:40 GMT
- Title: Neural Networks with Small Weights and Depth-Separation Barriers
- Authors: Gal Vardi and Ohad Shamir
- Abstract summary: For constant depths, existing results are limited to depths $2$ and $3$, and achieving results for higher depths has been an important question.
We focus on feedforward ReLU networks, and prove fundamental barriers to proving such results beyond depth $4$.
- Score: 40.66211670342284
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In studying the expressiveness of neural networks, an important question is
whether there are functions which can only be approximated by sufficiently deep
networks, assuming their size is bounded. However, for constant depths,
existing results are limited to depths $2$ and $3$, and achieving results for
higher depths has been an important open question. In this paper, we focus on
feedforward ReLU networks, and prove fundamental barriers to proving such
results beyond depth $4$, by reduction to open problems and natural-proof
barriers in circuit complexity. To show this, we study a seemingly unrelated
problem of independent interest: Namely, whether there are polynomially-bounded
functions which require super-polynomial weights in order to approximate with
constant-depth neural networks. We provide a negative and constructive answer
to that question, by showing that if a function can be approximated by a
polynomially-sized, constant depth $k$ network with arbitrarily large weights,
it can also be approximated by a polynomially-sized, depth $3k+3$ network,
whose weights are polynomially bounded.
Related papers
- Asymptotics of Learning with Deep Structured (Random) Features [9.366617422860543]
For a large class of feature maps we provide a tight characterisation of the test error associated with learning the readout layer.
In some cases our results can capture feature maps learned by deep, finite-width neural networks trained under gradient descent.
arXiv Detail & Related papers (2024-02-21T18:35:27Z) - Depth Separation in Norm-Bounded Infinite-Width Neural Networks [55.21840159087921]
We study depth separation in infinite-width neural networks, where complexity is controlled by the overall squared $ell$-norm of the weights.
We show that there are functions that are learnable with sample complexity in the input dimension by norm-controlled depth-3 ReLU networks, yet are not learnable with sub-exponential sample complexity by norm-controlled depth-2 ReLU networks.
arXiv Detail & Related papers (2024-02-13T21:26:38Z) - Depth Separation with Multilayer Mean-Field Networks [14.01059700772468]
We show that arXiv:1904.06984 constructed a function that is easy to approximate using a 3-layer network but not approximable by any 2-layer network.
Our result relies on a new way of extending the mean-field limit to multilayer networks.
arXiv Detail & Related papers (2023-04-03T15:18:16Z) - Bayesian Interpolation with Deep Linear Networks [92.1721532941863]
Characterizing how neural network depth, width, and dataset size jointly impact model quality is a central problem in deep learning theory.
We show that linear networks make provably optimal predictions at infinite depth.
We also show that with data-agnostic priors, Bayesian model evidence in wide linear networks is maximized at infinite depth.
arXiv Detail & Related papers (2022-12-29T20:57:46Z) - Rosenblatt's first theorem and frugality of deep learning [0.0]
Rosenblatt's theorem about omnipotence of shallow networks states that elementary perceptrons can solve any classification problem if there are no discrepancies in the training set.
Minsky and Papert considered elementary perceptrons with restrictions on the neural inputs a bounded number of connections or a relatively small diameter of the receptive field for each neuron at the hidden layer.
In this note, we demonstrated first Rosenblatt's theorem at work, showed how an elementary perceptron can solve a version of the travel maze problem, and analysed the complexity of that solution.
arXiv Detail & Related papers (2022-08-29T09:44:27Z) - Width is Less Important than Depth in ReLU Neural Networks [40.83290846983707]
We show that any target network with inputs in $mathbbRd$ can be approximated by a width $O(d)$ network.
We extend our results to constructing networks with bounded weights, and to constructing networks with width at most $d+2$.
arXiv Detail & Related papers (2022-02-08T13:07:22Z) - Size and Depth Separation in Approximating Natural Functions with Neural
Networks [52.73592689730044]
We show the benefits of size and depth for approximation of natural functions with ReLU networks.
We show a complexity-theoretic barrier to proving such results beyond size $O(d)$.
We also show an explicit natural function, that can be approximated with networks of size $O(d)$.
arXiv Detail & Related papers (2021-01-30T21:30:11Z) - A simple geometric proof for the benefit of depth in ReLU networks [57.815699322370826]
We present a simple proof for the benefit of depth in multi-layer feedforward network with rectified activation ("depth separation")
We present a concrete neural network with linear depth (in $m$) and small constant width ($leq 4$) that classifies the problem with zero error.
arXiv Detail & Related papers (2021-01-18T15:40:27Z) - ESPN: Extremely Sparse Pruned Networks [50.436905934791035]
We show that a simple iterative mask discovery method can achieve state-of-the-art compression of very deep networks.
Our algorithm represents a hybrid approach between single shot network pruning methods and Lottery-Ticket type approaches.
arXiv Detail & Related papers (2020-06-28T23:09:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.