Related papers: A deep network construction that adapts to intrinsic dimensionality beyond the domain

A deep network construction that adapts to intrinsic dimensionality beyond the domain

URL: http://arxiv.org/abs/2008.02545v3
Date: Mon, 26 Apr 2021 09:05:48 GMT
Title: A deep network construction that adapts to intrinsic dimensionality beyond the domain
Authors: Alexander Cloninger and Timo Klock
Abstract summary: We study the approximation of two-layer compositions $f(x) = g(phi(x))$ via deep networks with ReLU activation. We focus on two intuitive and practically relevant choices for $phi$: the projection onto a low-dimensional embedded submanifold and a distance to a collection of low-dimensional sets.
Score: 79.23797234241471
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the approximation of two-layer compositions $f(x) = g(\phi(x))$ via deep networks with ReLU activation, where $\phi$ is a geometrically intuitive, dimensionality reducing feature map. We focus on two intuitive and practically relevant choices for $\phi$: the projection onto a low-dimensional embedded submanifold and a distance to a collection of low-dimensional sets. We achieve near optimal approximation rates, which depend only on the complexity of the dimensionality reducing map $\phi$ rather than the ambient dimension. Since $\phi$ encapsulates all nonlinear features that are material to the function $f$, this suggests that deep nets are faithful to an intrinsic dimension governed by $f$ rather than the complexity of the domain of $f$. In particular, the prevalent assumption of approximating functions on low-dimensional manifolds can be significantly relaxed using functions of type $f(x) = g(\phi(x))$ with $\phi$ representing an orthogonal projection onto the same manifold.

Related papers

Implicit Hypersurface Approximation Capacity in Deep ReLU Networks [0.0]
We develop a geometric approximation theory for deep feed-forward neural networks with ReLU activations. We show that a deep fully-connected ReLU network of width $d+1$ can implicitly construct an approximation as its zero contour.
arXiv Detail & Related papers (2024-07-04T11:34:42Z)
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$<n>We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ with a complexity that is not governed by information exponents.
arXiv Detail & Related papers (2024-06-03T17:56:58Z)
Polynomial Width is Sufficient for Set Representation with High-dimensional Features [69.65698500919869]
DeepSets is the most widely used neural network architecture for set representation. We present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE)
arXiv Detail & Related papers (2023-07-08T16:00:59Z)
Effective Minkowski Dimension of Deep Nonparametric Regression: Function Approximation and Statistical Theories [70.90012822736988]
Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to intrinsic data structures. This paper introduces a relaxed assumption that input data are concentrated around a subset of $mathbbRd$ denoted by $mathcalS$, and the intrinsic dimension $mathcalS$ can be characterized by a new complexity notation -- effective Minkowski dimension.
arXiv Detail & Related papers (2023-06-26T17:13:31Z)
Bottleneck Structure in Learned Features: Low-Dimension vs Regularity Tradeoff [12.351756386062291]
We formalize a balance between learning low-dimensional representations and minimizing complexity/irregularity in the feature maps. For large depths, almost all hidden representations are approximately $R(0)(f)$-dimensional, and almost all weight matrices $W_ell$ have $R(0)(f)$ singular values close to 1. Interestingly, the use of large learning rates is required to guarantee an order $O(L)$ NTK which in turns guarantees infinite depth convergence of the representations of almost all layers.
arXiv Detail & Related papers (2023-05-30T13:06:26Z)
On minimal representations of shallow ReLU networks [0.0]
We show that the minimal representation for $f$ uses either $n$, $n+1$ or $n+2$ neurons. In particular, where the input layer is one-dimensional, minimal representations always use at most $n+1$ neurons but in all higher dimensional settings there are functions for which $n+2$ neurons are needed.
arXiv Detail & Related papers (2021-08-12T10:22:24Z)
Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances [9.390008801320024]
We show that adding one extra neuron to each is sufficient to connect all previously discrete minima into a single manifold. We show that the number of symmetry-induced critical subspaces dominates the number of affine subspaces forming the global minima manifold.
arXiv Detail & Related papers (2021-05-25T21:19:07Z)
Size and Depth Separation in Approximating Natural Functions with Neural Networks [52.73592689730044]
We show the benefits of size and depth for approximation of natural functions with ReLU networks. We show a complexity-theoretic barrier to proving such results beyond size $O(d)$. We also show an explicit natural function, that can be approximated with networks of size $O(d)$.
arXiv Detail & Related papers (2021-01-30T21:30:11Z)
Small Covers for Near-Zero Sets of Polynomials and Learning Latent Variable Models [56.98280399449707]
We show that there exists an $epsilon$-cover for $S$ of cardinality $M = (k/epsilon)O_d(k1/d)$. Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models hidden variables.
arXiv Detail & Related papers (2020-12-14T18:14:08Z)
Sharp Representation Theorems for ReLU Networks with Precise Dependence on Depth [26.87238691716307]
We prove sharp-free representation results for neural networks with $D$ ReLU layers under square loss. Our results confirm the prevailing hypothesis that deeper networks are better at representing less smooth functions.
arXiv Detail & Related papers (2020-06-07T05:25:06Z)
On the Modularity of Hypernetworks [103.1147622394852]
We show that for a structured target function, the overall number of trainable parameters in a hypernetwork is smaller by orders of magnitude than the number of trainable parameters of a standard neural network and an embedding method.
arXiv Detail & Related papers (2020-02-23T22:51:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.