Sharp Representation Theorems for ReLU Networks with Precise Dependence
on Depth
- URL: http://arxiv.org/abs/2006.04048v2
- Date: Sun, 21 Feb 2021 21:51:01 GMT
- Title: Sharp Representation Theorems for ReLU Networks with Precise Dependence
on Depth
- Authors: Guy Bresler and Dheeraj Nagaraj
- Abstract summary: We prove sharp-free representation results for neural networks with $D$ ReLU layers under square loss.
Our results confirm the prevailing hypothesis that deeper networks are better at representing less smooth functions.
- Score: 26.87238691716307
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We prove sharp dimension-free representation results for neural networks with
$D$ ReLU layers under square loss for a class of functions $\mathcal{G}_D$
defined in the paper. These results capture the precise benefits of depth in
the following sense:
1. The rates for representing the class of functions $\mathcal{G}_D$ via $D$
ReLU layers is sharp up to constants, as shown by matching lower bounds.
2. For each $D$, $\mathcal{G}_{D} \subseteq \mathcal{G}_{D+1}$ and as $D$
grows the class of functions $\mathcal{G}_{D}$ contains progressively less
smooth functions.
3. If $D^{\prime} < D$, then the approximation rate for the class
$\mathcal{G}_D$ achieved by depth $D^{\prime}$ networks is strictly worse than
that achieved by depth $D$ networks.
This constitutes a fine-grained characterization of the representation power
of feedforward networks of arbitrary depth $D$ and number of neurons $N$, in
contrast to existing representation results which either require $D$ growing
quickly with $N$ or assume that the function being represented is highly
smooth. In the latter case similar rates can be obtained with a single
nonlinear layer. Our results confirm the prevailing hypothesis that deeper
networks are better at representing less smooth functions, and indeed, the main
technical novelty is to fully exploit the fact that deep networks can produce
highly oscillatory functions with few activation functions.
Related papers
- Implicit Hypersurface Approximation Capacity in Deep ReLU Networks [0.0]
We develop a geometric approximation theory for deep feed-forward neural networks with ReLU activations.
We show that a deep fully-connected ReLU network of width $d+1$ can implicitly construct an approximation as its zero contour.
arXiv Detail & Related papers (2024-07-04T11:34:42Z) - Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
We study the problem of learning hierarchical functions over the standard Gaussian distribution with three-layer neural networks.
For a large subclass of degree $k$s $p$, a three-layer neural network trained via layerwise gradientp descent on the square loss learns the target $h$ up to vanishing test error.
This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
arXiv Detail & Related papers (2023-11-23T02:19:32Z) - Polynomial Width is Sufficient for Set Representation with
High-dimensional Features [69.65698500919869]
DeepSets is the most widely used neural network architecture for set representation.
We present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE)
arXiv Detail & Related papers (2023-07-08T16:00:59Z) - On Enhancing Expressive Power via Compositions of Single Fixed-Size ReLU
Network [11.66117393949175]
We show that the repeated compositions of a single fixed-size ReLU network exhibit surprising expressive power.
Our results reveal that a continuous-depth network generated via a dynamical system has immense approximation power even if its dynamics function is time-independent.
arXiv Detail & Related papers (2023-01-29T04:12:58Z) - Understanding Deep Neural Function Approximation in Reinforcement
Learning via $\epsilon$-Greedy Exploration [53.90873926758026]
This paper provides a theoretical study of deep neural function approximation in reinforcement learning (RL)
We focus on the value based algorithm with the $epsilon$-greedy exploration via deep (and two-layer) neural networks endowed by Besov (and Barron) function spaces.
Our analysis reformulates the temporal difference error in an $L2(mathrmdmu)$-integrable space over a certain averaged measure $mu$, and transforms it to a generalization problem under the non-iid setting.
arXiv Detail & Related papers (2022-09-15T15:42:47Z) - Shallow neural network representation of polynomials [91.3755431537592]
We show that $d$-variables of degreeR$ can be represented on $[0,1]d$ as shallow neural networks of width $d+1+sum_r=2Rbinomr+d-1d-1d-1[binomr+d-1d-1d-1[binomr+d-1d-1d-1[binomr+d-1d-1d-1d-1[binomr+d-1d-1d-1d-1
arXiv Detail & Related papers (2022-08-17T08:14:52Z) - On minimal representations of shallow ReLU networks [0.0]
We show that the minimal representation for $f$ uses either $n$, $n+1$ or $n+2$ neurons.
In particular, where the input layer is one-dimensional, minimal representations always use at most $n+1$ neurons but in all higher dimensional settings there are functions for which $n+2$ neurons are needed.
arXiv Detail & Related papers (2021-08-12T10:22:24Z) - A deep network construction that adapts to intrinsic dimensionality
beyond the domain [79.23797234241471]
We study the approximation of two-layer compositions $f(x) = g(phi(x))$ via deep networks with ReLU activation.
We focus on two intuitive and practically relevant choices for $phi$: the projection onto a low-dimensional embedded submanifold and a distance to a collection of low-dimensional sets.
arXiv Detail & Related papers (2020-08-06T09:50:29Z) - On the Modularity of Hypernetworks [103.1147622394852]
We show that for a structured target function, the overall number of trainable parameters in a hypernetwork is smaller by orders of magnitude than the number of trainable parameters of a standard neural network and an embedding method.
arXiv Detail & Related papers (2020-02-23T22:51:52Z) - A Corrective View of Neural Networks: Representation, Memorization and
Learning [26.87238691716307]
We develop a corrective mechanism for neural network approximation.
We show that two-layer neural networks in the random features regime (RF) can memorize arbitrary labels.
We also consider three-layer neural networks and show that the corrective mechanism yields faster representation rates for smooth radial functions.
arXiv Detail & Related papers (2020-02-01T20:51:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.