Metric Hypertransformers are Universal Adapted Maps
- URL: http://arxiv.org/abs/2201.13094v1
- Date: Mon, 31 Jan 2022 10:03:46 GMT
- Title: Metric Hypertransformers are Universal Adapted Maps
- Authors: Beatrice Acciaio, Anastasis Kratsios, Gudmund Pammer
- Abstract summary: metric hypertransformers (MHTs) are capable of approxing any adapted map $F:mathscrXmathbbZrightarrow mathscrYmathbbZ$ with approximable complexity.
Our results provide the first (quantimat) universal approximation theorem compatible with any such $mathscrX$ and $mathscrY$.
- Score: 4.83420384410068
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a universal class of geometric deep learning models, called
metric hypertransformers (MHTs), capable of approximating any adapted map
$F:\mathscr{X}^{\mathbb{Z}}\rightarrow \mathscr{Y}^{\mathbb{Z}}$ with
approximable complexity, where $\mathscr{X}\subseteq \mathbb{R}^d$ and
$\mathscr{Y}$ is any suitable metric space, and $\mathscr{X}^{\mathbb{Z}}$
(resp. $\mathscr{Y}^{\mathbb{Z}}$) capture all discrete-time paths on
$\mathscr{X}$ (resp. $\mathscr{Y}$). Suitable spaces $\mathscr{Y}$ include
various (adapted) Wasserstein spaces, all Fr\'{e}chet spaces admitting a
Schauder basis, and a variety of Riemannian manifolds arising from information
geometry. Even in the static case, where $f:\mathscr{X}\rightarrow \mathscr{Y}$
is a H\"{o}lder map, our results provide the first (quantitative) universal
approximation theorem compatible with any such $\mathscr{X}$ and $\mathscr{Y}$.
Our universal approximation theorems are quantitative, and they depend on the
regularity of $F$, the choice of activation function, the metric entropy and
diameter of $\mathscr{X}$, and on the regularity of the compact set of paths
whereon the approximation is performed. Our guiding examples originate from
mathematical finance. Notably, the MHT models introduced here are able to
approximate a broad range of stochastic processes' kernels, including solutions
to SDEs, many processes with arbitrarily long memory, and functions mapping
sequential data to sequences of forward rate curves.
Related papers
- Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - Families of costs with zero and nonnegative MTW tensor in optimal
transport [0.0]
We compute explicitly the MTW tensor for the optimal transport problem on $mathbbRn$ with a cost function of form $mathsfc$.
We analyze the $sinh$-type hyperbolic cost, providing examples of $mathsfc$-type functions and divergence.
arXiv Detail & Related papers (2024-01-01T20:33:27Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - An Approximation Theory for Metric Space-Valued Functions With A View
Towards Deep Learning [25.25903127886586]
We build universal functions approximators of continuous maps between arbitrary Polish metric spaces $mathcalX$ and $mathcalY$.
In particular, we show that the required number of Dirac measures is determined by the structure of $mathcalX$ and $mathcalY$.
arXiv Detail & Related papers (2023-04-24T16:18:22Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Local approximation of operators [0.0]
We study the problem of determining the degree of approximation of a non-linear operator between metric spaces $mathfrakX$ and $mathfrakY$.
We establish constructive methods to do this efficiently, i.e., with the constants involved in the estimates on the approximation on $mathbbSd$ being $mathcalO(d1/6)$.
arXiv Detail & Related papers (2022-02-13T19:28:34Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Spectral properties of sample covariance matrices arising from random
matrices with independent non identically distributed columns [50.053491972003656]
It was previously shown that the functionals $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$.
Here, we show that $|mathbb E[R(z)] - tilde R(z)|_F
arXiv Detail & Related papers (2021-09-06T14:21:43Z) - Small Covers for Near-Zero Sets of Polynomials and Learning Latent
Variable Models [56.98280399449707]
We show that there exists an $epsilon$-cover for $S$ of cardinality $M = (k/epsilon)O_d(k1/d)$.
Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models hidden variables.
arXiv Detail & Related papers (2020-12-14T18:14:08Z) - A Canonical Transform for Strengthening the Local $L^p$-Type Universal
Approximation Property [4.18804572788063]
$Lp$-type universal approximation theorems guarantee that a given machine learning model class $mathscrFsubseteq C(mathbbRd,mathbbRD)$ is dense in $Lp_mu(mathbbRd,mathbbRD)$.
This paper proposes a generic solution to this approximation theoretic problem by introducing a canonical transformation which "upgrades $mathscrF$'s approximation property"
arXiv Detail & Related papers (2020-06-24T17:46:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.