An Approximation Theory for Metric Space-Valued Functions With A View
Towards Deep Learning
- URL: http://arxiv.org/abs/2304.12231v2
- Date: Mon, 24 Jul 2023 16:00:37 GMT
- Title: An Approximation Theory for Metric Space-Valued Functions With A View
Towards Deep Learning
- Authors: Anastasis Kratsios, Chong Liu, Matti Lassas, Maarten V. de Hoop, Ivan
Dokmani\'c
- Abstract summary: We build universal functions approximators of continuous maps between arbitrary Polish metric spaces $mathcalX$ and $mathcalY$.
In particular, we show that the required number of Dirac measures is determined by the structure of $mathcalX$ and $mathcalY$.
- Score: 25.25903127886586
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Motivated by the developing mathematics of deep learning, we build universal
functions approximators of continuous maps between arbitrary Polish metric
spaces $\mathcal{X}$ and $\mathcal{Y}$ using elementary functions between
Euclidean spaces as building blocks. Earlier results assume that the target
space $\mathcal{Y}$ is a topological vector space. We overcome this limitation
by ``randomization'': our approximators output discrete probability measures
over $\mathcal{Y}$. When $\mathcal{X}$ and $\mathcal{Y}$ are Polish without
additional structure, we prove very general qualitative guarantees; when they
have suitable combinatorial structure, we prove quantitative guarantees for
H\"{o}lder-like maps, including maps between finite graphs, solution operators
to rough differential equations between certain Carnot groups, and continuous
non-linear operators between Banach spaces arising in inverse problems. In
particular, we show that the required number of Dirac measures is determined by
the combinatorial structure of $\mathcal{X}$ and $\mathcal{Y}$. For barycentric
$\mathcal{Y}$, including Banach spaces, $\mathbb{R}$-trees, Hadamard manifolds,
or Wasserstein spaces on Polish metric spaces, our approximators reduce to
$\mathcal{Y}$-valued functions. When the Euclidean approximators are neural
networks, our constructions generalize transformer networks, providing a new
probabilistic viewpoint of geometric deep learning.
Related papers
- A Theory of Interpretable Approximations [61.90216959710842]
We study the idea of approximating a target concept $c$ by a small aggregation of concepts from some base class $mathcalH$.
For any given pair of $mathcalH$ and $c$, exactly one of these cases holds: (i) $c$ cannot be approximated by $mathcalH$ with arbitrary accuracy.
We show that, in the case of interpretable approximations, even a slightly nontrivial a-priori guarantee on the complexity of approximations implies approximations with constant (distribution-free and accuracy-
arXiv Detail & Related papers (2024-06-15T06:43:45Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - Manifold learning in Wasserstein space [2.9581047417235298]
We build the theoretical foundations for manifold learning algorithms on a compact and convex subset of $mathbbRd$, metrized with the Wasserstein-2 distance $mathrmW$.
We show that the metric space $(Lambda,mathrmW_Lambda)$ can beally recovered in the sense of Gromov--Wasserstein from a graph with nodes $lambda_i_i=1N and edge weights $lambda_i,lambda_j.
arXiv Detail & Related papers (2023-11-14T21:21:35Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Local approximation of operators [0.0]
We study the problem of determining the degree of approximation of a non-linear operator between metric spaces $mathfrakX$ and $mathfrakY$.
We establish constructive methods to do this efficiently, i.e., with the constants involved in the estimates on the approximation on $mathbbSd$ being $mathcalO(d1/6)$.
arXiv Detail & Related papers (2022-02-13T19:28:34Z) - Metric Hypertransformers are Universal Adapted Maps [4.83420384410068]
metric hypertransformers (MHTs) are capable of approxing any adapted map $F:mathscrXmathbbZrightarrow mathscrYmathbbZ$ with approximable complexity.
Our results provide the first (quantimat) universal approximation theorem compatible with any such $mathscrX$ and $mathscrY$.
arXiv Detail & Related papers (2022-01-31T10:03:46Z) - Universal Regular Conditional Distributions via Probability
Measure-Valued Deep Neural Models [3.8073142980733]
We find that any model built using the proposed framework is dense in the space $C(mathcalX,mathcalP_1(mathcalY))$.
The proposed models are also shown to be capable of generically expressing the aleatoric uncertainty present in most randomized machine learning models.
arXiv Detail & Related papers (2021-05-17T11:34:09Z) - Small Covers for Near-Zero Sets of Polynomials and Learning Latent
Variable Models [56.98280399449707]
We show that there exists an $epsilon$-cover for $S$ of cardinality $M = (k/epsilon)O_d(k1/d)$.
Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models hidden variables.
arXiv Detail & Related papers (2020-12-14T18:14:08Z) - Average-case Complexity of Teaching Convex Polytopes via Halfspace
Queries [55.28642461328172]
We show that the average-case teaching complexity is $Theta(d)$, which is in sharp contrast to the worst-case teaching complexity of $Theta(n)$.
Our insights allow us to establish a tight bound on the average-case complexity for $phi$-separable dichotomies.
arXiv Detail & Related papers (2020-06-25T19:59:24Z) - A diffusion approach to Stein's method on Riemannian manifolds [65.36007959755302]
We exploit the relationship between the generator of a diffusion on $mathbf M$ with target invariant measure and its characterising Stein operator.
We derive Stein factors, which bound the solution to the Stein equation and its derivatives.
We imply that the bounds for $mathbb Rm$ remain valid when $mathbf M$ is a flat manifold.
arXiv Detail & Related papers (2020-03-25T17:03:58Z) - Learning the Hypotheses Space from data Part II: Convergence and
Feasibility [0.0]
We show consistency of a model selection framework based on Learning Spaces.
We show that it is feasible to learn the Hypotheses Space from data.
arXiv Detail & Related papers (2020-01-30T21:48:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.