Depth Separation in Norm-Bounded Infinite-Width Neural Networks
- URL: http://arxiv.org/abs/2402.08808v1
- Date: Tue, 13 Feb 2024 21:26:38 GMT
- Title: Depth Separation in Norm-Bounded Infinite-Width Neural Networks
- Authors: Suzanna Parkinson, Greg Ongie, Rebecca Willett, Ohad Shamir, Nathan
Srebro
- Abstract summary: We study depth separation in infinite-width neural networks, where complexity is controlled by the overall squared $ell$-norm of the weights.
We show that there are functions that are learnable with sample complexity in the input dimension by norm-controlled depth-3 ReLU networks, yet are not learnable with sub-exponential sample complexity by norm-controlled depth-2 ReLU networks.
- Score: 55.21840159087921
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study depth separation in infinite-width neural networks, where complexity
is controlled by the overall squared $\ell_2$-norm of the weights (sum of
squares of all weights in the network). Whereas previous depth separation
results focused on separation in terms of width, such results do not give
insight into whether depth determines if it is possible to learn a network that
generalizes well even when the network width is unbounded. Here, we study
separation in terms of the sample complexity required for learnability.
Specifically, we show that there are functions that are learnable with sample
complexity polynomial in the input dimension by norm-controlled depth-3 ReLU
networks, yet are not learnable with sub-exponential sample complexity by
norm-controlled depth-2 ReLU networks (with any value for the norm). We also
show that a similar statement in the reverse direction is not possible: any
function learnable with polynomial sample complexity by a norm-controlled
depth-2 ReLU network with infinite width is also learnable with polynomial
sample complexity by a norm-controlled depth-3 ReLU network.
Related papers
- On Size-Independent Sample Complexity of ReLU Networks [9.15749739027059]
We study the sample complexity of learning ReLU neural networks from the point of view of generalization.
We estimate the Rademacher complexity of the associated function class.
arXiv Detail & Related papers (2023-06-03T03:41:33Z) - The Sample Complexity of One-Hidden-Layer Neural Networks [57.6421258363243]
We study a class of scalar-valued one-hidden-layer networks, and inputs bounded in Euclidean norm.
We prove that controlling the spectral norm of the hidden layer weight matrix is insufficient to get uniform convergence guarantees.
We analyze two important settings where a mere spectral norm control turns out to be sufficient.
arXiv Detail & Related papers (2022-02-13T07:12:02Z) - Path Regularization: A Convexity and Sparsity Inducing Regularization
for Parallel ReLU Networks [75.33431791218302]
We study the training problem of deep neural networks and introduce an analytic approach to unveil hidden convexity in the optimization landscape.
We consider a deep parallel ReLU network architecture, which also includes standard deep networks and ResNets as its special cases.
arXiv Detail & Related papers (2021-10-18T18:00:36Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Learning Deep ReLU Networks Is Fixed-Parameter Tractable [21.625005195943707]
We consider the problem of learning an unknown ReLU network with respect to Gaussian inputs.
We give an algorithm whose running time is a fixed weights in the ambient dimension.
Our bounds depend on the number of hidden units, depth, spectral norm of the spectral norm, and Lipschitz constant.
arXiv Detail & Related papers (2020-09-28T17:58:43Z) - Neural Networks with Small Weights and Depth-Separation Barriers [40.66211670342284]
For constant depths, existing results are limited to depths $2$ and $3$, and achieving results for higher depths has been an important question.
We focus on feedforward ReLU networks, and prove fundamental barriers to proving such results beyond depth $4$.
arXiv Detail & Related papers (2020-05-31T21:56:17Z) - Neural Networks are Convex Regularizers: Exact Polynomial-time Convex
Optimization Formulations for Two-layer Networks [70.15611146583068]
We develop exact representations of training two-layer neural networks with rectified linear units (ReLUs)
Our theory utilizes semi-infinite duality and minimum norm regularization.
arXiv Detail & Related papers (2020-02-24T21:32:41Z) - Revealing the Structure of Deep Neural Networks via Convex Duality [70.15611146583068]
We study regularized deep neural networks (DNNs) and introduce a convex analytic framework to characterize the structure of hidden layers.
We show that a set of optimal hidden layer weights for a norm regularized training problem can be explicitly found as the extreme points of a convex set.
We apply the same characterization to deep ReLU networks with whitened data and prove the same weight alignment holds.
arXiv Detail & Related papers (2020-02-22T21:13:44Z) - Quasi-Equivalence of Width and Depth of Neural Networks [10.365556153676538]
We investigate if the design of artificial neural networks should have a directional preference.
Inspired by the De Morgan law, we establish a quasi-equivalence between the width and depth of ReLU networks.
Based on our findings, a deep network has a wide equivalent, subject to an arbitrarily small error.
arXiv Detail & Related papers (2020-02-06T21:17:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.