Quasi-Equivalence of Width and Depth of Neural Networks
- URL: http://arxiv.org/abs/2002.02515v7
- Date: Tue, 24 May 2022 01:21:30 GMT
- Title: Quasi-Equivalence of Width and Depth of Neural Networks
- Authors: Feng-Lei Fan, Rongjie Lai, Ge Wang
- Abstract summary: We investigate if the design of artificial neural networks should have a directional preference.
Inspired by the De Morgan law, we establish a quasi-equivalence between the width and depth of ReLU networks.
Based on our findings, a deep network has a wide equivalent, subject to an arbitrarily small error.
- Score: 10.365556153676538
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While classic studies proved that wide networks allow universal
approximation, recent research and successes of deep learning demonstrate the
power of deep networks. Based on a symmetric consideration, we investigate if
the design of artificial neural networks should have a directional preference,
and what the mechanism of interaction is between the width and depth of a
network. Inspired by the De Morgan law, we address this fundamental question by
establishing a quasi-equivalence between the width and depth of ReLU networks
in two aspects. First, we formulate two transforms for mapping an arbitrary
ReLU network to a wide network and a deep network respectively for either
regression or classification so that the essentially same capability of the
original network can be implemented. Then, we replace the mainstream artificial
neuron type with a quadratic counterpart, and utilize the factorization and
continued fraction representations of the same polynomial function to construct
a wide network and a deep network, respectively. Based on our findings, a deep
network has a wide equivalent, and vice versa, subject to an arbitrarily small
error.
Related papers
- Rotation Equivariant Proximal Operator for Deep Unfolding Methods in Image Restoration [62.41329042683779]
We propose a high-accuracy rotation equivariant proximal network that embeds rotation symmetry priors into the deep unfolding framework.
This study makes efforts to suggest a high-accuracy rotation equivariant proximal network that effectively embeds rotation symmetry priors into the deep unfolding framework.
arXiv Detail & Related papers (2023-12-25T11:53:06Z) - The Evolution of the Interplay Between Input Distributions and Linear
Regions in Networks [20.97553518108504]
We count the number of linear convex regions in deep neural networks based on ReLU.
In particular, we prove that for any one-dimensional input, there exists a minimum threshold for the number of neurons required to express it.
We also unveil the iterative refinement process of decision boundaries in ReLU networks during training.
arXiv Detail & Related papers (2023-10-28T15:04:53Z) - Feature Learning and Generalization in Deep Networks with Orthogonal Weights [1.7956122940209063]
Deep neural networks with numerically weights from independent Gaussian distributions can be tuned to criticality.
These networks still exhibit fluctuations that grow linearly with the depth of the network.
We show analytically that rectangular networks with tanh activations and weights from the ensemble of matrices have corresponding preactivation fluctuations.
arXiv Detail & Related papers (2023-10-11T18:00:02Z) - Bayesian Interpolation with Deep Linear Networks [92.1721532941863]
Characterizing how neural network depth, width, and dataset size jointly impact model quality is a central problem in deep learning theory.
We show that linear networks make provably optimal predictions at infinite depth.
We also show that with data-agnostic priors, Bayesian model evidence in wide linear networks is maximized at infinite depth.
arXiv Detail & Related papers (2022-12-29T20:57:46Z) - Width is Less Important than Depth in ReLU Neural Networks [40.83290846983707]
We show that any target network with inputs in $mathbbRd$ can be approximated by a width $O(d)$ network.
We extend our results to constructing networks with bounded weights, and to constructing networks with width at most $d+2$.
arXiv Detail & Related papers (2022-02-08T13:07:22Z) - Towards Understanding Theoretical Advantages of Complex-Reaction
Networks [77.34726150561087]
We show that a class of functions can be approximated by a complex-reaction network using the number of parameters.
For empirical risk minimization, our theoretical result shows that the critical point set of complex-reaction networks is a proper subset of that of real-valued networks.
arXiv Detail & Related papers (2021-08-15T10:13:49Z) - Adversarial Examples in Multi-Layer Random ReLU Networks [39.797621513256026]
adversarial examples arise in ReLU networks with independent gaussian parameters.
Bottleneck layers in the network play a key role: the minimal width up to some point determines scales and sensitivities of mappings computed up to that point.
arXiv Detail & Related papers (2021-06-23T18:16:34Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Recursive Multi-model Complementary Deep Fusion forRobust Salient Object
Detection via Parallel Sub Networks [62.26677215668959]
Fully convolutional networks have shown outstanding performance in the salient object detection (SOD) field.
This paper proposes a wider'' network architecture which consists of parallel sub networks with totally different network architectures.
Experiments on several famous benchmarks clearly demonstrate the superior performance, good generalization, and powerful learning ability of the proposed wider framework.
arXiv Detail & Related papers (2020-08-07T10:39:11Z) - A Rigorous Framework for the Mean Field Limit of Multilayer Neural
Networks [9.89901717499058]
We develop a mathematically rigorous framework for embedding neural networks in the mean field regime.
As the network's widths increase, the network's learning trajectory is shown to be well captured by a limit.
We prove several properties of large-width multilayer networks.
arXiv Detail & Related papers (2020-01-30T16:43:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.