Depth Separations in Neural Networks: Separating the Dimension from the
Accuracy
- URL: http://arxiv.org/abs/2402.07248v1
- Date: Sun, 11 Feb 2024 17:27:26 GMT
- Title: Depth Separations in Neural Networks: Separating the Dimension from the
Accuracy
- Authors: Itay Safran, Daniel Reichman, Paul Valiant
- Abstract summary: We prove an exponential separation between depth 2 and depth 3 neural networks, when approximating an $mathcalO(1)Lipschitz target function to constant accuracy.
Our lower bound holds for a wide variety of activation functions, and is based on an application of the worst-case random self-reducibility argument.
- Score: 10.995895410470279
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We prove an exponential separation between depth 2 and depth 3 neural
networks, when approximating an $\mathcal{O}(1)$-Lipschitz target function to
constant accuracy, with respect to a distribution with support in $[0,1]^{d}$,
assuming exponentially bounded weights. This addresses an open problem posed in
\citet{safran2019depth}, and proves that the curse of dimensionality manifests
in depth 2 approximation, even in cases where the target function can be
represented efficiently using depth 3. Previously, lower bounds that were used
to separate depth 2 from depth 3 required that at least one of the Lipschitz
parameter, target accuracy or (some measure of) the size of the domain of
approximation scale polynomially with the input dimension, whereas we fix the
former two and restrict our domain to the unit hypercube. Our lower bound holds
for a wide variety of activation functions, and is based on a novel application
of an average- to worst-case random self-reducibility argument, to reduce the
problem to threshold circuits lower bounds.
Related papers
- Error Analysis of Three-Layer Neural Network Trained with PGD for Deep Ritz Method [7.723218675113336]
We employ a three-layer tanh neural network within the framework of the deep Ritz method to solve second-order elliptic equations.
We perform projected gradient descent to train the three-layer network and we establish its global convergence.
We present error bound in terms of the sample size $n$ and our work provides guidance on how to set the network depth, width, step size, and number of iterations for the projected gradient descent algorithm.
arXiv Detail & Related papers (2024-05-19T05:07:09Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimiax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Information-Theoretic Generalization Bounds for Deep Neural Networks [22.87479366196215]
Deep neural networks (DNNs) exhibit an exceptional capacity for generalization in practical applications.
This work aims to capture the effect and benefits of depth for supervised learning via information-theoretic generalization bounds.
arXiv Detail & Related papers (2024-04-04T03:20:35Z) - NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth
Supervision for Indoor Multi-View 3D Detection [72.0098999512727]
NeRF-Det has achieved impressive performance in indoor multi-view 3D detection by utilizing NeRF to enhance representation learning.
We present three corresponding solutions, including semantic enhancement, perspective-aware sampling, and ordinal depth supervision.
The resulting algorithm, NeRF-Det++, has exhibited appealing performance in the ScanNetV2 and AR KITScenes datasets.
arXiv Detail & Related papers (2024-02-22T11:48:06Z) - How Many Neurons Does it Take to Approximate the Maximum? [10.995895410470279]
We study the size of a neural network needed to approximate the maximum function over $d$ inputs.
We provide new lower and upper bounds on the width required for approximation across various depths.
arXiv Detail & Related papers (2023-07-18T12:47:35Z) - The Implicit Bias of Minima Stability in Multivariate Shallow ReLU
Networks [53.95175206863992]
We study the type of solutions to which gradient descent converges when used to train a single hidden-layer multivariate ReLU network with the quadratic loss.
We prove that although shallow ReLU networks are universal approximators, stable shallow networks are not.
arXiv Detail & Related papers (2023-06-30T09:17:39Z) - Sample Complexity of Nonparametric Off-Policy Evaluation on
Low-Dimensional Manifolds using Deep Networks [71.95722100511627]
We consider the off-policy evaluation problem of reinforcement learning using deep neural networks.
We show that, by choosing network size appropriately, one can leverage the low-dimensional manifold structure in the Markov decision process.
arXiv Detail & Related papers (2022-06-06T20:25:20Z) - Global Convergence of Three-layer Neural Networks in the Mean Field
Regime [3.553493344868413]
In the mean field regime, neural networks are appropriately scaled so that as the width tends to infinity, the learning dynamics tends to a nonlinear and nontrivial dynamical limit, known as the mean field limit.
Recent works have successfully applied such analysis to two-layer networks and provided global convergence guarantees.
We prove a global convergence result for unregularized feedforward three-layer networks in the mean field regime.
arXiv Detail & Related papers (2021-05-11T17:45:42Z) - Learning Deep ReLU Networks Is Fixed-Parameter Tractable [21.625005195943707]
We consider the problem of learning an unknown ReLU network with respect to Gaussian inputs.
We give an algorithm whose running time is a fixed weights in the ambient dimension.
Our bounds depend on the number of hidden units, depth, spectral norm of the spectral norm, and Lipschitz constant.
arXiv Detail & Related papers (2020-09-28T17:58:43Z) - Wasserstein Distances for Stereo Disparity Estimation [62.09272563885437]
Existing approaches to depth or disparity estimation output a distribution over a set of pre-defined discrete values.
This leads to inaccurate results when the true depth or disparity does not match any of these values.
We address these issues using a new neural network architecture that is capable of outputting arbitrary depth values.
arXiv Detail & Related papers (2020-07-06T21:37:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.