Rethink Depth Separation with Intra-layer Links
- URL: http://arxiv.org/abs/2305.07037v1
- Date: Thu, 11 May 2023 11:54:36 GMT
- Title: Rethink Depth Separation with Intra-layer Links
- Authors: Feng-Lei Fan, Ze-Yu Li, Huan Xiong, Tieyong Zeng
- Abstract summary: We study the depth separation theory in the context of shortcuts.
We show that a shallow network with intra-layer links does not need to go as wide as before to express some hard functions constructed by a deep network.
Our results supplement the existing depth separation theory by examining its limit in the shortcut domain.
- Score: 23.867032824891723
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The depth separation theory is nowadays widely accepted as an effective
explanation for the power of depth, which consists of two parts: i) there
exists a function representable by a deep network; ii) such a function cannot
be represented by a shallow network whose width is lower than a threshold.
However, this theory is established for feedforward networks. Few studies, if
not none, considered the depth separation theory in the context of shortcuts
which are the most common network types in solving real-world problems. Here,
we find that adding intra-layer links can modify the depth separation theory.
First, we report that adding intra-layer links can greatly improve a network's
representation capability through bound estimation, explicit construction, and
functional space analysis. Then, we modify the depth separation theory by
showing that a shallow network with intra-layer links does not need to go as
wide as before to express some hard functions constructed by a deep network.
Such functions include the renowned "sawtooth" functions. Moreover, the saving
of width is up to linear. Our results supplement the existing depth separation
theory by examining its limit in the shortcut domain. Also, the mechanism we
identify can be translated into analyzing the expressivity of popular shortcut
networks such as ResNet and DenseNet, \textit{e.g.}, residual connections
empower a network to represent a sawtooth function efficiently.
Related papers
- Hamiltonian Mechanics of Feature Learning: Bottleneck Structure in Leaky ResNets [58.460298576330835]
We study Leaky ResNets, which interpolate between ResNets ($tildeLtoinfty$) and Fully-Connected nets ($tildeLtoinfty$)
In the infinite depth limit, we study'representation geodesics' $A_p$: continuous paths in representation space (similar to NeuralODEs)
We leverage this intuition to explain the emergence of a bottleneck structure, as observed in previous work.
arXiv Detail & Related papers (2024-05-27T18:15:05Z) - Network Degeneracy as an Indicator of Training Performance: Comparing
Finite and Infinite Width Angle Predictions [3.04585143845864]
We show that as networks get deeper and deeper, they are more susceptible to becoming degenerate.
We use a simple algorithm that can accurately predict the level of degeneracy for any given fully connected ReLU network architecture.
arXiv Detail & Related papers (2023-06-02T13:02:52Z) - Depth Separation with Multilayer Mean-Field Networks [14.01059700772468]
We show that arXiv:1904.06984 constructed a function that is easy to approximate using a 3-layer network but not approximable by any 2-layer network.
Our result relies on a new way of extending the mean-field limit to multilayer networks.
arXiv Detail & Related papers (2023-04-03T15:18:16Z) - Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers.
It is an instance of a key structural condition that applies across broad domains of machine learning.
For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z) - Identifying Class Specific Filters with L1 Norm Frequency Histograms in
Deep CNNs [1.1278903078792917]
We analyze the final and penultimate layers of Deep Convolutional Networks.
We identify subsets of features that contribute most towards the network's decision for a class.
arXiv Detail & Related papers (2021-12-14T19:40:55Z) - The Connection Between Approximation, Depth Separation and Learnability
in Neural Networks [70.55686685872008]
We study the connection between learnability and approximation capacity.
We show that learnability with deep networks of a target function depends on the ability of simpler classes to approximate the target.
arXiv Detail & Related papers (2021-01-31T11:32:30Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Recursive Multi-model Complementary Deep Fusion forRobust Salient Object
Detection via Parallel Sub Networks [62.26677215668959]
Fully convolutional networks have shown outstanding performance in the salient object detection (SOD) field.
This paper proposes a wider'' network architecture which consists of parallel sub networks with totally different network architectures.
Experiments on several famous benchmarks clearly demonstrate the superior performance, good generalization, and powerful learning ability of the proposed wider framework.
arXiv Detail & Related papers (2020-08-07T10:39:11Z) - Doubly infinite residual neural networks: a diffusion process approach [8.642603456626393]
We show that deep ResNets do not suffer from undesirable forward-propagation properties.
We focus on doubly infinite fully-connected ResNets, for which we consider i.i.d.
Our results highlight a limited expressive power of doubly infinite ResNets when the unscaled network's parameters are i.i.d. and the residual blocks are shallow.
arXiv Detail & Related papers (2020-07-07T07:45:34Z) - Quasi-Equivalence of Width and Depth of Neural Networks [10.365556153676538]
We investigate if the design of artificial neural networks should have a directional preference.
Inspired by the De Morgan law, we establish a quasi-equivalence between the width and depth of ReLU networks.
Based on our findings, a deep network has a wide equivalent, subject to an arbitrarily small error.
arXiv Detail & Related papers (2020-02-06T21:17:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.