Neural Network Architecture Beyond Width and Depth
- URL: http://arxiv.org/abs/2205.09459v1
- Date: Thu, 19 May 2022 10:29:11 GMT
- Title: Neural Network Architecture Beyond Width and Depth
- Authors: Zuowei Shen, Haizhao Yang, Shijun Zhang
- Abstract summary: This paper proposes a new neural network architecture by introducing an additional dimension called height beyond width and depth.
It is shown that neural networks with three-dimensional architectures are significantly more expressive than the ones with two-dimensional architectures.
- Score: 4.468952886990851
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a new neural network architecture by introducing an
additional dimension called height beyond width and depth. Neural network
architectures with height, width, and depth as hyperparameters are called
three-dimensional architectures. It is shown that neural networks with
three-dimensional architectures are significantly more expressive than the ones
with two-dimensional architectures (those with only width and depth as
hyperparameters), e.g., standard fully connected networks. The new network
architecture is constructed recursively via a nested structure, and hence we
call a network with the new architecture nested network (NestNet). A NestNet of
height $s$ is built with each hidden neuron activated by a NestNet of height
$\le s-1$. When $s=1$, a NestNet degenerates to a standard network with a
two-dimensional architecture. It is proved by construction that height-$s$ ReLU
NestNets with $\mathcal{O}(n)$ parameters can approximate Lipschitz continuous
functions on $[0,1]^d$ with an error $\mathcal{O}(n^{-(s+1)/d})$, while the
optimal approximation error of standard ReLU networks with $\mathcal{O}(n)$
parameters is $\mathcal{O}(n^{-2/d})$. Furthermore, such a result is extended
to generic continuous functions on $[0,1]^d$ with the approximation error
characterized by the modulus of continuity. Finally, a numerical example is
provided to explore the advantages of the super approximation power of ReLU
NestNets.
Related papers
- Deep Neural Networks: Multi-Classification and Universal Approximation [0.0]
We demonstrate that a ReLU deep neural network with a width of $2$ and a depth of $2N+4M-1$ layers can achieve finite sample memorization for any dataset comprising $N$ elements.
We also provide depth estimates for approximating $W1,p$ functions and width estimates for approximating $Lp(Omega;mathbbRm)$ for $mgeq1$.
arXiv Detail & Related papers (2024-09-10T14:31:21Z) - Implicit Hypersurface Approximation Capacity in Deep ReLU Networks [0.0]
We develop a geometric approximation theory for deep feed-forward neural networks with ReLU activations.
We show that a deep fully-connected ReLU network of width $d+1$ can implicitly construct an approximation as its zero contour.
arXiv Detail & Related papers (2024-07-04T11:34:42Z) - Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
We study the problem of learning hierarchical functions over the standard Gaussian distribution with three-layer neural networks.
For a large subclass of degree $k$s $p$, a three-layer neural network trained via layerwise gradientp descent on the square loss learns the target $h$ up to vanishing test error.
This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
arXiv Detail & Related papers (2023-11-23T02:19:32Z) - Densely Connected $G$-invariant Deep Neural Networks with Signed
Permutation Representations [6.200483285433661]
We introduce and investigate, for finite groups $G$, $G$-invariant deep neural network ($G$-DNN) architectures with ReLU activation.
The preactivations of the $G$-DNNs are able to transform by emphsigned permutation representations (signed perm-reps) of $G$.
We show that there are far more admissible $G$-DNN architectures than those accessible with the concatenated ReLU'' activation function from the literature.
arXiv Detail & Related papers (2023-03-08T14:35:03Z) - Achieve the Minimum Width of Neural Networks for Universal Approximation [1.52292571922932]
We study the exact minimum width, $w_min$, for the universal approximation property (UAP) of neural networks.
In particular, the critical width, $w*_min$, for $Lp$-UAP can be achieved by leaky-ReLU networks.
arXiv Detail & Related papers (2022-09-23T04:03:50Z) - Shallow neural network representation of polynomials [91.3755431537592]
We show that $d$-variables of degreeR$ can be represented on $[0,1]d$ as shallow neural networks of width $d+1+sum_r=2Rbinomr+d-1d-1d-1[binomr+d-1d-1d-1[binomr+d-1d-1d-1[binomr+d-1d-1d-1d-1[binomr+d-1d-1d-1d-1
arXiv Detail & Related papers (2022-08-17T08:14:52Z) - Robust Training and Verification of Implicit Neural Networks: A
Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks.
We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network.
We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z) - Exponential Separations in Symmetric Neural Networks [48.80300074254758]
We consider symmetric Networkparencitesantoro 2017simple architecture as a natural generalization of DeepSetsparencitezaheerdeep architecture.
Under the restriction to analytic activation functions, we construct a symmetric function acting on sets of dimensions $N$ in dimension with $D$.
arXiv Detail & Related papers (2022-06-02T19:45:10Z) - Theory of Deep Convolutional Neural Networks III: Approximating Radial
Functions [7.943024117353317]
We consider a family of deep neural networks consisting of two groups of convolutional layers, a down operator, and a fully connected layer.
The network structure depends on two structural parameters which determine the numbers of convolutional layers and the width of the fully connected layer.
arXiv Detail & Related papers (2021-07-02T08:22:12Z) - Deep neural network approximation of analytic functions [91.3755431537592]
entropy bound for the spaces of neural networks with piecewise linear activation functions.
We derive an oracle inequality for the expected error of the considered penalized deep neural network estimators.
arXiv Detail & Related papers (2021-04-05T18:02:04Z) - Function approximation by deep neural networks with parameters $\{0,\pm
\frac{1}{2}, \pm 1, 2\}$ [91.3755431537592]
It is shown that $C_beta$-smooth functions can be approximated by neural networks with parameters $0,pm frac12, pm 1, 2$.
The depth, width and the number of active parameters of constructed networks have, up to a logarithimc factor, the same dependence on the approximation error as the networks with parameters in $[-1,1]$.
arXiv Detail & Related papers (2021-03-15T19:10:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.