Quantitative Rates and Fundamental Obstructions to Non-Euclidean
Universal Approximation with Deep Narrow Feed-Forward Networks
- URL: http://arxiv.org/abs/2101.05390v2
- Date: Wed, 27 Jan 2021 08:38:02 GMT
- Title: Quantitative Rates and Fundamental Obstructions to Non-Euclidean
Universal Approximation with Deep Narrow Feed-Forward Networks
- Authors: Anastasis Kratsios, Leonie Papon
- Abstract summary: We quantify the number of narrow layers required for "deep geometric feed-forward neural networks"
We find that both the global and local universal approximation guarantees can only coincide when approximating null-homotopic functions.
- Score: 3.8073142980733
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: By incorporating structured pairs of non-trainable input and output layers,
the universal approximation property of feed-forward have recently been
extended across a broad range of non-Euclidean input spaces X and output spaces
Y. We quantify the number of narrow layers required for these "deep geometric
feed-forward neural networks" (DGNs) to approximate any continuous function in
$C(X,Y)$, uniformly on compacts. The DGN architecture is then extended to
accommodate complete Riemannian manifolds, where the input and output layers
are only defined locally, and we obtain local analogs of our results. In this
case, we find that both the global and local universal approximation guarantees
can only coincide when approximating null-homotopic functions. Consequently, we
show that if Y is a compact Riemannian manifold, then there exists a function
that cannot be uniformly approximated on large compact subsets of X.
Nevertheless, we obtain lower-bounds of the maximum diameter of any geodesic
ball in X wherein our local universal approximation results hold. Applying our
results, we build universal approximators between spaces of non-degenerate
Gaussian measures. We also obtain a quantitative version of the universal
approximation theorem for classical deep narrow feed-forward networks with
general activation functions.
Related papers
- Universal approximation results for neural networks with non-polynomial activation function over non-compact domains [3.3379026542599934]
We derive universal approximation results for neural networks within function spaces over non-compact subsets of a Euclidean space.
We provide some dimension-independent rates for approximating a function with sufficiently regular and integrable Fourier transform by neural networks with non-polynomial activation function.
arXiv Detail & Related papers (2024-10-18T09:53:20Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Convergence of mean-field Langevin dynamics: Time and space
discretization, stochastic gradient, and variance reduction [49.66486092259376]
The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift.
Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures.
We provide a framework to prove a uniform-in-time propagation of chaos for MFLD that takes into account the errors due to finite-particle approximation, time-discretization, and gradient approximation.
arXiv Detail & Related papers (2023-06-12T16:28:11Z) - Global universal approximation of functional input maps on weighted
spaces [3.8059763597999012]
We introduce so-called functional input neural networks defined on a possibly infinite dimensional weighted space with values also in a possibly infinite dimensional output space.
We prove a global universal approximation result on weighted spaces for continuous functions going beyond the usual approximation on compact sets.
We emphasize that the reproducing Hilbert kernel space of the signature kernels are Cameron-Martin spaces of certain Gaussian processes.
arXiv Detail & Related papers (2023-06-05T23:06:32Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Optimal 1-Wasserstein Distance for WGANs [2.1174215880331775]
We provide a thorough analysis of Wasserstein GANs (WGANs) in both the finite sample and regimes.
We derive in passing new results on optimal transport theory in the semi-discrete setting.
arXiv Detail & Related papers (2022-01-08T13:04:03Z) - Federated Functional Gradient Boosting [75.06942944563572]
We study functional minimization in Federated Learning.
For both FFGB.C and FFGB.L, the radii of convergence shrink to zero as the feature distributions become more homogeneous.
arXiv Detail & Related papers (2021-03-11T21:49:19Z) - Universal Approximation Property of Neural Ordinary Differential
Equations [19.861764482790544]
We show that NODEs can form an $Lp$-universal approximator for continuous maps under certain conditions.
We also show their stronger approximation property, namely the $sup$-universality for approximating a large class of diffeomorphisms.
arXiv Detail & Related papers (2020-12-04T05:53:21Z) - Bayesian Deep Ensembles via the Neural Tangent Kernel [49.569912265882124]
We explore the link between deep ensembles and Gaussian processes (GPs) through the lens of the Neural Tangent Kernel (NTK)
We introduce a simple modification to standard deep ensembles training, through addition of a computationally-tractable, randomised and untrainable function to each ensemble member.
We prove that our Bayesian deep ensembles make more conservative predictions than standard deep ensembles in the infinite width limit.
arXiv Detail & Related papers (2020-07-11T22:10:52Z) - Minimum Width for Universal Approximation [91.02689252671291]
We prove that the minimum width required for the universal approximation of the $Lp$ functions is exactly $maxd_x+1,d_y$.
We also prove that the same conclusion does not hold for the uniform approximation with ReLU, but does hold with an additional threshold activation function.
arXiv Detail & Related papers (2020-06-16T01:24:21Z) - Non-Euclidean Universal Approximation [4.18804572788063]
Modifications to a neural network's input and output layers are often required to accommodate the specificities of most practical learning tasks.
We present general conditions describing feature and readout maps that preserve an architecture's ability to approximate any continuous functions uniformly on compacts.
arXiv Detail & Related papers (2020-06-03T15:38:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.