Rethink Depth Separation with Intra-layer Links
- URL: http://arxiv.org/abs/2305.07037v1
- Date: Thu, 11 May 2023 11:54:36 GMT
- Title: Rethink Depth Separation with Intra-layer Links
- Authors: Feng-Lei Fan, Ze-Yu Li, Huan Xiong, Tieyong Zeng
- Abstract summary: We study the depth separation theory in the context of shortcuts.
We show that a shallow network with intra-layer links does not need to go as wide as before to express some hard functions constructed by a deep network.
Our results supplement the existing depth separation theory by examining its limit in the shortcut domain.
- Score: 23.867032824891723
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The depth separation theory is nowadays widely accepted as an effective
explanation for the power of depth, which consists of two parts: i) there
exists a function representable by a deep network; ii) such a function cannot
be represented by a shallow network whose width is lower than a threshold.
However, this theory is established for feedforward networks. Few studies, if
not none, considered the depth separation theory in the context of shortcuts
which are the most common network types in solving real-world problems. Here,
we find that adding intra-layer links can modify the depth separation theory.
First, we report that adding intra-layer links can greatly improve a network's
representation capability through bound estimation, explicit construction, and
functional space analysis. Then, we modify the depth separation theory by
showing that a shallow network with intra-layer links does not need to go as
wide as before to express some hard functions constructed by a deep network.
Such functions include the renowned "sawtooth" functions. Moreover, the saving
of width is up to linear. Our results supplement the existing depth separation
theory by examining its limit in the shortcut domain. Also, the mechanism we
identify can be translated into analyzing the expressivity of popular shortcut
networks such as ResNet and DenseNet, \textit{e.g.}, residual connections
empower a network to represent a sawtooth function efficiently.
Related papers
- Deep Neural Networks: Multi-Classification and Universal Approximation [0.0]
We demonstrate that a ReLU deep neural network with a width of $2$ and a depth of $2N+4M-1$ layers can achieve finite sample memorization for any dataset comprising $N$ elements.
We also provide depth estimates for approximating $W1,p$ functions and width estimates for approximating $Lp(Omega;mathbbRm)$ for $mgeq1$.
arXiv Detail & Related papers (2024-09-10T14:31:21Z) - Implicit Hypersurface Approximation Capacity in Deep ReLU Networks [0.0]
We develop a geometric approximation theory for deep feed-forward neural networks with ReLU activations.
We show that a deep fully-connected ReLU network of width $d+1$ can implicitly construct an approximation as its zero contour.
arXiv Detail & Related papers (2024-07-04T11:34:42Z) - Bayesian Inference with Deep Weakly Nonlinear Networks [57.95116787699412]
We show at a physics level of rigor that Bayesian inference with a fully connected neural network is solvable.
We provide techniques to compute the model evidence and posterior to arbitrary order in $1/N$ and at arbitrary temperature.
arXiv Detail & Related papers (2024-05-26T17:08:04Z) - Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
We study the problem of learning hierarchical functions over the standard Gaussian distribution with three-layer neural networks.
For a large subclass of degree $k$s $p$, a three-layer neural network trained via layerwise gradientp descent on the square loss learns the target $h$ up to vanishing test error.
This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
arXiv Detail & Related papers (2023-11-23T02:19:32Z) - Rates of Approximation by ReLU Shallow Neural Networks [8.22379888383833]
We show that ReLU shallow neural networks with $m$ hidden neurons can uniformly approximate functions from the H"older space.
Such rates are very close to the optimal one $O(m-fracrd)$ in the sense that $fracd+2d+4d+4$ is close to $1$, when the dimension $d$ is large.
arXiv Detail & Related papers (2023-07-24T00:16:50Z) - Understanding Deep Neural Function Approximation in Reinforcement
Learning via $\epsilon$-Greedy Exploration [53.90873926758026]
This paper provides a theoretical study of deep neural function approximation in reinforcement learning (RL)
We focus on the value based algorithm with the $epsilon$-greedy exploration via deep (and two-layer) neural networks endowed by Besov (and Barron) function spaces.
Our analysis reformulates the temporal difference error in an $L2(mathrmdmu)$-integrable space over a certain averaged measure $mu$, and transforms it to a generalization problem under the non-iid setting.
arXiv Detail & Related papers (2022-09-15T15:42:47Z) - Shallow neural network representation of polynomials [91.3755431537592]
We show that $d$-variables of degreeR$ can be represented on $[0,1]d$ as shallow neural networks of width $d+1+sum_r=2Rbinomr+d-1d-1d-1[binomr+d-1d-1d-1[binomr+d-1d-1d-1[binomr+d-1d-1d-1d-1[binomr+d-1d-1d-1d-1
arXiv Detail & Related papers (2022-08-17T08:14:52Z) - Neural Network Architecture Beyond Width and Depth [4.468952886990851]
This paper proposes a new neural network architecture by introducing an additional dimension called height beyond width and depth.
It is shown that neural networks with three-dimensional architectures are significantly more expressive than the ones with two-dimensional architectures.
arXiv Detail & Related papers (2022-05-19T10:29:11Z) - Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK [58.5766737343951]
We consider the dynamic of descent for learning a two-layer neural network.
We show that an over-parametrized two-layer neural network can provably learn with gradient loss at most ground with Tangent samples.
arXiv Detail & Related papers (2020-07-09T07:09:28Z) - Deep Polynomial Neural Networks [77.70761658507507]
$Pi$Nets are a new class of function approximators based on expansions.
$Pi$Nets produce state-the-art results in three challenging tasks, i.e. image generation, face verification and 3D mesh representation learning.
arXiv Detail & Related papers (2020-06-20T16:23:32Z) - Sharp Representation Theorems for ReLU Networks with Precise Dependence
on Depth [26.87238691716307]
We prove sharp-free representation results for neural networks with $D$ ReLU layers under square loss.
Our results confirm the prevailing hypothesis that deeper networks are better at representing less smooth functions.
arXiv Detail & Related papers (2020-06-07T05:25:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.