On Enhancing Expressive Power via Compositions of Single Fixed-Size ReLU
Network
- URL: http://arxiv.org/abs/2301.12353v2
- Date: Tue, 30 May 2023 22:01:05 GMT
- Title: On Enhancing Expressive Power via Compositions of Single Fixed-Size ReLU
Network
- Authors: Shijun Zhang, Jianfeng Lu, Hongkai Zhao
- Abstract summary: We show that the repeated compositions of a single fixed-size ReLU network exhibit surprising expressive power.
Our results reveal that a continuous-depth network generated via a dynamical system has immense approximation power even if its dynamics function is time-independent.
- Score: 11.66117393949175
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper explores the expressive power of deep neural networks through the
framework of function compositions. We demonstrate that the repeated
compositions of a single fixed-size ReLU network exhibit surprising expressive
power, despite the limited expressive capabilities of the individual network
itself. Specifically, we prove by construction that $\mathcal{L}_2\circ
\boldsymbol{g}^{\circ r}\circ \boldsymbol{\mathcal{L}}_1$ can approximate
$1$-Lipschitz continuous functions on $[0,1]^d$ with an error
$\mathcal{O}(r^{-1/d})$, where $\boldsymbol{g}$ is realized by a fixed-size
ReLU network, $\boldsymbol{\mathcal{L}}_1$ and $\mathcal{L}_2$ are two affine
linear maps matching the dimensions, and $\boldsymbol{g}^{\circ r}$ denotes the
$r$-times composition of $\boldsymbol{g}$. Furthermore, we extend such a result
to generic continuous functions on $[0,1]^d$ with the approximation error
characterized by the modulus of continuity. Our results reveal that a
continuous-depth network generated via a dynamical system has immense
approximation power even if its dynamics function is time-independent and
realized by a fixed-size ReLU network.
Related papers
- Implicit Hypersurface Approximation Capacity in Deep ReLU Networks [0.0]
We develop a geometric approximation theory for deep feed-forward neural networks with ReLU activations.
We show that a deep fully-connected ReLU network of width $d+1$ can implicitly construct an approximation as its zero contour.
arXiv Detail & Related papers (2024-07-04T11:34:42Z) - Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
We study the problem of learning hierarchical functions over the standard Gaussian distribution with three-layer neural networks.
For a large subclass of degree $k$s $p$, a three-layer neural network trained via layerwise gradientp descent on the square loss learns the target $h$ up to vanishing test error.
This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
arXiv Detail & Related papers (2023-11-23T02:19:32Z) - Shallow neural network representation of polynomials [91.3755431537592]
We show that $d$-variables of degreeR$ can be represented on $[0,1]d$ as shallow neural networks of width $d+1+sum_r=2Rbinomr+d-1d-1d-1[binomr+d-1d-1d-1[binomr+d-1d-1d-1[binomr+d-1d-1d-1d-1[binomr+d-1d-1d-1d-1
arXiv Detail & Related papers (2022-08-17T08:14:52Z) - Expressive power of binary and ternary neural networks [91.3755431537592]
We show that deep sparse ReLU networks with ternary weights and deep ReLU networks with binary weights can approximate $beta$-H"older functions on $[0,1]d$.
arXiv Detail & Related papers (2022-06-27T13:16:08Z) - ReLU Network Approximation in Terms of Intrinsic Parameters [5.37133760455631]
We study the approximation error of ReLU networks in terms of the number of intrinsic parameters.
We design a ReLU network with only three intrinsic parameters to approximate H"older continuous functions with an arbitrarily small error.
arXiv Detail & Related papers (2021-11-15T18:20:38Z) - On minimal representations of shallow ReLU networks [0.0]
We show that the minimal representation for $f$ uses either $n$, $n+1$ or $n+2$ neurons.
In particular, where the input layer is one-dimensional, minimal representations always use at most $n+1$ neurons but in all higher dimensional settings there are functions for which $n+2$ neurons are needed.
arXiv Detail & Related papers (2021-08-12T10:22:24Z) - Neural networks with superexpressive activations and integer weights [91.3755431537592]
An example of an activation function $sigma$ is given such that networks with activations $sigma, lfloorcdotrfloor$, integer weights and a fixed architecture is given.
The range of integer weights required for $varepsilon$-approximation of H"older continuous functions is derived.
arXiv Detail & Related papers (2021-05-20T17:29:08Z) - Deep Neural Networks with ReLU-Sine-Exponential Activations Break Curse
of Dimensionality on H\"older Class [6.476766717110237]
We construct neural networks with ReLU, sine and $2x$ as activation functions.
In addition to its supper expressive power, functions implemented by ReLU-sine-$2x$ networks are (generalized) differentiable.
arXiv Detail & Related papers (2021-02-28T15:57:42Z) - A deep network construction that adapts to intrinsic dimensionality
beyond the domain [79.23797234241471]
We study the approximation of two-layer compositions $f(x) = g(phi(x))$ via deep networks with ReLU activation.
We focus on two intuitive and practically relevant choices for $phi$: the projection onto a low-dimensional embedded submanifold and a distance to a collection of low-dimensional sets.
arXiv Detail & Related papers (2020-08-06T09:50:29Z) - Sharp Representation Theorems for ReLU Networks with Precise Dependence
on Depth [26.87238691716307]
We prove sharp-free representation results for neural networks with $D$ ReLU layers under square loss.
Our results confirm the prevailing hypothesis that deeper networks are better at representing less smooth functions.
arXiv Detail & Related papers (2020-06-07T05:25:06Z) - On the Modularity of Hypernetworks [103.1147622394852]
We show that for a structured target function, the overall number of trainable parameters in a hypernetwork is smaller by orders of magnitude than the number of trainable parameters of a standard neural network and an embedding method.
arXiv Detail & Related papers (2020-02-23T22:51:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.