On minimal representations of shallow ReLU networks
- URL: http://arxiv.org/abs/2108.05643v1
- Date: Thu, 12 Aug 2021 10:22:24 GMT
- Title: On minimal representations of shallow ReLU networks
- Authors: S. Dereich and S. Kassing
- Abstract summary: We show that the minimal representation for $f$ uses either $n$, $n+1$ or $n+2$ neurons.
In particular, where the input layer is one-dimensional, minimal representations always use at most $n+1$ neurons but in all higher dimensional settings there are functions for which $n+2$ neurons are needed.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The realization function of a shallow ReLU network is a continuous and
piecewise affine function $f:\mathbb R^d\to \mathbb R$, where the domain
$\mathbb R^{d}$ is partitioned by a set of $n$ hyperplanes into cells on which
$f$ is affine. We show that the minimal representation for $f$ uses either $n$,
$n+1$ or $n+2$ neurons and we characterize each of the three cases. In the
particular case, where the input layer is one-dimensional, minimal
representations always use at most $n+1$ neurons but in all higher dimensional
settings there are functions for which $n+2$ neurons are needed. Then we show
that the set of minimal networks representing $f$ forms a
$C^\infty$-submanifold $M$ and we derive the dimension and the number of
connected components of $M$. Additionally, we give a criterion for the
hyperplanes that guarantees that all continuous, piecewise affine functions are
realization functions of appropriate ReLU networks.
Related papers
- The Communication Complexity of Approximating Matrix Rank [50.6867896228563]
We show that this problem has randomized communication complexity $Omega(frac1kcdot n2log|mathbbF|)$.
As an application, we obtain an $Omega(frac1kcdot n2log|mathbbF|)$ space lower bound for any streaming algorithm with $k$ passes.
arXiv Detail & Related papers (2024-10-26T06:21:42Z) - LevAttention: Time, Space, and Streaming Efficient Algorithm for Heavy Attentions [54.54897832889028]
We show that for any $K$, there is a universal set" $U subset [n]$ of size independent of $n$, such that for any $Q$ and any row $i$, the large attention scores $A_i,j$ in row $i$ of $A$ all have $jin U$.
We empirically show the benefits of our scheme for vision transformers, showing how to train new models that use our universal set while training as well.
arXiv Detail & Related papers (2024-10-07T19:47:13Z) - Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
We study the problem of learning hierarchical functions over the standard Gaussian distribution with three-layer neural networks.
For a large subclass of degree $k$s $p$, a three-layer neural network trained via layerwise gradientp descent on the square loss learns the target $h$ up to vanishing test error.
This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
arXiv Detail & Related papers (2023-11-23T02:19:32Z) - Geometric structure of shallow neural networks and constructive ${\mathcal L}^2$ cost minimization [1.189367612437469]
We consider shallow neural networks with one hidden layer, a ReLU activation function, an $mathcal L2$ Schatten class (or Hilbert-Schmidt) cost function.
We prove an upper bound on the minimum of the cost function of order $O(delta_P)$ where $delta_P$ measures the signal to noise ratio of training inputs.
In the special case $M=Q$, we explicitly determine an exact degenerate local minimum of the cost function, and show that the sharp value differs from the upper bound obtained for $Qleq M$ by a
arXiv Detail & Related papers (2023-09-19T07:12:41Z) - Tractability of approximation by general shallow networks [0.0]
We consider approximation of functions of the form $ xmapstoint_mathbbY G( x, y)dtau( y)$, $ xinmathbbX$, by $G$-networks of the form $ xmapsto sum_k=1n a_kG( x, y_k)$.
We obtain independent dimension bounds on the degree of approximation in terms of $n$, where also the constants involved are all dependent on the dimensions.
arXiv Detail & Related papers (2023-08-07T00:14:46Z) - Shallow neural network representation of polynomials [91.3755431537592]
We show that $d$-variables of degreeR$ can be represented on $[0,1]d$ as shallow neural networks of width $d+1+sum_r=2Rbinomr+d-1d-1d-1[binomr+d-1d-1d-1[binomr+d-1d-1d-1[binomr+d-1d-1d-1d-1[binomr+d-1d-1d-1d-1
arXiv Detail & Related papers (2022-08-17T08:14:52Z) - Geometry of the Loss Landscape in Overparameterized Neural Networks:
Symmetries and Invariances [9.390008801320024]
We show that adding one extra neuron to each is sufficient to connect all previously discrete minima into a single manifold.
We show that the number of symmetry-induced critical subspaces dominates the number of affine subspaces forming the global minima manifold.
arXiv Detail & Related papers (2021-05-25T21:19:07Z) - A deep network construction that adapts to intrinsic dimensionality
beyond the domain [79.23797234241471]
We study the approximation of two-layer compositions $f(x) = g(phi(x))$ via deep networks with ReLU activation.
We focus on two intuitive and practically relevant choices for $phi$: the projection onto a low-dimensional embedded submanifold and a distance to a collection of low-dimensional sets.
arXiv Detail & Related papers (2020-08-06T09:50:29Z) - Sharp Representation Theorems for ReLU Networks with Precise Dependence
on Depth [26.87238691716307]
We prove sharp-free representation results for neural networks with $D$ ReLU layers under square loss.
Our results confirm the prevailing hypothesis that deeper networks are better at representing less smooth functions.
arXiv Detail & Related papers (2020-06-07T05:25:06Z) - On the Modularity of Hypernetworks [103.1147622394852]
We show that for a structured target function, the overall number of trainable parameters in a hypernetwork is smaller by orders of magnitude than the number of trainable parameters of a standard neural network and an embedding method.
arXiv Detail & Related papers (2020-02-23T22:51:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.