Sharp Representation Theorems for ReLU Networks with Precise Dependence
on Depth
- URL: http://arxiv.org/abs/2006.04048v2
- Date: Sun, 21 Feb 2021 21:51:01 GMT
- Title: Sharp Representation Theorems for ReLU Networks with Precise Dependence
on Depth
- Authors: Guy Bresler and Dheeraj Nagaraj
- Abstract summary: We prove sharp-free representation results for neural networks with $D$ ReLU layers under square loss.
Our results confirm the prevailing hypothesis that deeper networks are better at representing less smooth functions.
- Score: 26.87238691716307
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We prove sharp dimension-free representation results for neural networks with
$D$ ReLU layers under square loss for a class of functions $\mathcal{G}_D$
defined in the paper. These results capture the precise benefits of depth in
the following sense:
1. The rates for representing the class of functions $\mathcal{G}_D$ via $D$
ReLU layers is sharp up to constants, as shown by matching lower bounds.
2. For each $D$, $\mathcal{G}_{D} \subseteq \mathcal{G}_{D+1}$ and as $D$
grows the class of functions $\mathcal{G}_{D}$ contains progressively less
smooth functions.
3. If $D^{\prime} < D$, then the approximation rate for the class
$\mathcal{G}_D$ achieved by depth $D^{\prime}$ networks is strictly worse than
that achieved by depth $D$ networks.
This constitutes a fine-grained characterization of the representation power
of feedforward networks of arbitrary depth $D$ and number of neurons $N$, in
contrast to existing representation results which either require $D$ growing
quickly with $N$ or assume that the function being represented is highly
smooth. In the latter case similar rates can be obtained with a single
nonlinear layer. Our results confirm the prevailing hypothesis that deeper
networks are better at representing less smooth functions, and indeed, the main
technical novelty is to fully exploit the fact that deep networks can produce
highly oscillatory functions with few activation functions.
Related papers
- Implicit Hypersurface Approximation Capacity in Deep ReLU Networks [0.0]
We develop a geometric approximation theory for deep feed-forward neural networks with ReLU activations.
We show that a deep fully-connected ReLU network of width $d+1$ can implicitly construct an approximation as its zero contour.
arXiv Detail & Related papers (2024-07-04T11:34:42Z) - Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$
We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ with a complexity that is not governed by information exponents.
arXiv Detail & Related papers (2024-06-03T17:56:58Z) - Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
We study the problem of learning hierarchical functions over the standard Gaussian distribution with three-layer neural networks.
For a large subclass of degree $k$s $p$, a three-layer neural network trained via layerwise gradientp descent on the square loss learns the target $h$ up to vanishing test error.
This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
arXiv Detail & Related papers (2023-11-23T02:19:32Z) - On Expressivity of Height in Neural Networks [29.49793694185358]
We call a neural network characterized by width, depth, and height a 3D network.
We show via bound estimation and explicit construction that given the same number of neurons and parameters, a 3D ReLU network of width $W$, depth $K$, and height $H$ has greater expressive power than a 2D network of width $Htimes W$ and depth $K$.
arXiv Detail & Related papers (2023-05-11T11:54:36Z) - On Enhancing Expressive Power via Compositions of Single Fixed-Size ReLU
Network [11.66117393949175]
We show that the repeated compositions of a single fixed-size ReLU network exhibit surprising expressive power.
Our results reveal that a continuous-depth network generated via a dynamical system has immense approximation power even if its dynamics function is time-independent.
arXiv Detail & Related papers (2023-01-29T04:12:58Z) - Understanding Deep Neural Function Approximation in Reinforcement
Learning via $\epsilon$-Greedy Exploration [53.90873926758026]
This paper provides a theoretical study of deep neural function approximation in reinforcement learning (RL)
We focus on the value based algorithm with the $epsilon$-greedy exploration via deep (and two-layer) neural networks endowed by Besov (and Barron) function spaces.
Our analysis reformulates the temporal difference error in an $L2(mathrmdmu)$-integrable space over a certain averaged measure $mu$, and transforms it to a generalization problem under the non-iid setting.
arXiv Detail & Related papers (2022-09-15T15:42:47Z) - Shallow neural network representation of polynomials [91.3755431537592]
We show that $d$-variables of degreeR$ can be represented on $[0,1]d$ as shallow neural networks of width $d+1+sum_r=2Rbinomr+d-1d-1d-1[binomr+d-1d-1d-1[binomr+d-1d-1d-1[binomr+d-1d-1d-1d-1[binomr+d-1d-1d-1d-1
arXiv Detail & Related papers (2022-08-17T08:14:52Z) - On minimal representations of shallow ReLU networks [0.0]
We show that the minimal representation for $f$ uses either $n$, $n+1$ or $n+2$ neurons.
In particular, where the input layer is one-dimensional, minimal representations always use at most $n+1$ neurons but in all higher dimensional settings there are functions for which $n+2$ neurons are needed.
arXiv Detail & Related papers (2021-08-12T10:22:24Z) - A deep network construction that adapts to intrinsic dimensionality
beyond the domain [79.23797234241471]
We study the approximation of two-layer compositions $f(x) = g(phi(x))$ via deep networks with ReLU activation.
We focus on two intuitive and practically relevant choices for $phi$: the projection onto a low-dimensional embedded submanifold and a distance to a collection of low-dimensional sets.
arXiv Detail & Related papers (2020-08-06T09:50:29Z) - On the Modularity of Hypernetworks [103.1147622394852]
We show that for a structured target function, the overall number of trainable parameters in a hypernetwork is smaller by orders of magnitude than the number of trainable parameters of a standard neural network and an embedding method.
arXiv Detail & Related papers (2020-02-23T22:51:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.