Kernel Methods are Competitive for Operator Learning
- URL: http://arxiv.org/abs/2304.13202v2
- Date: Sun, 8 Oct 2023 18:41:08 GMT
- Title: Kernel Methods are Competitive for Operator Learning
- Authors: Pau Batlle, Matthieu Darcy, Bamdad Hosseini, Houman Owhadi
- Abstract summary: We present a kernel-based framework for learning operators between Banach spaces along with a priori error analysis.
We show that, even when using vanilla kernels, our approach is competitive in terms of cost-accuracy trade-off.
- Score: 1.4132765964347058
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a general kernel-based framework for learning operators between
Banach spaces along with a priori error analysis and comprehensive numerical
comparisons with popular neural net (NN) approaches such as Deep Operator Net
(DeepONet) [Lu et al.] and Fourier Neural Operator (FNO) [Li et al.]. We
consider the setting where the input/output spaces of target operator
$\mathcal{G}^\dagger\,:\, \mathcal{U}\to \mathcal{V}$ are reproducing kernel
Hilbert spaces (RKHS), the data comes in the form of partial observations
$\phi(u_i), \varphi(v_i)$ of input/output functions
$v_i=\mathcal{G}^\dagger(u_i)$ ($i=1,\ldots,N$), and the measurement operators
$\phi\,:\, \mathcal{U}\to \mathbb{R}^n$ and $\varphi\,:\, \mathcal{V} \to
\mathbb{R}^m$ are linear. Writing $\psi\,:\, \mathbb{R}^n \to \mathcal{U}$ and
$\chi\,:\, \mathbb{R}^m \to \mathcal{V}$ for the optimal recovery maps
associated with $\phi$ and $\varphi$, we approximate $\mathcal{G}^\dagger$ with
$\bar{\mathcal{G}}=\chi \circ \bar{f} \circ \phi$ where $\bar{f}$ is an optimal
recovery approximation of $f^\dagger:=\varphi \circ \mathcal{G}^\dagger \circ
\psi\,:\,\mathbb{R}^n \to \mathbb{R}^m$. We show that, even when using vanilla
kernels (e.g., linear or Mat\'{e}rn), our approach is competitive in terms of
cost-accuracy trade-off and either matches or beats the performance of NN
methods on a majority of benchmarks. Additionally, our framework offers several
advantages inherited from kernel methods: simplicity, interpretability,
convergence guarantees, a priori error estimates, and Bayesian uncertainty
quantification. As such, it can serve as a natural benchmark for operator
learning.
Related papers
- Operator Learning with Gaussian Processes [0.18641315013048293]
Operator learning focuses on approximating mappings $mathcalGdagger:mathcalU rightarrowmathcalV$ between infinite-dimensional spaces of functions.
We introduce a hybrid GP/NN-based framework for operator learning that leverages the strengths of both methods.
arXiv Detail & Related papers (2024-09-06T18:06:08Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Beyond Independent Measurements: General Compressed Sensing with GNN
Application [4.924126492174801]
We consider the problem of recovering a structured signal $mathbfx in mathbbRn$ from noisy cone observations.
We show that the effective rank of $mathbfB$ may be used as a surrogate for the number of measurements.
arXiv Detail & Related papers (2021-10-30T20:35:56Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Optimal Spectral Recovery of a Planted Vector in a Subspace [80.02218763267992]
We study efficient estimation and detection of a planted vector $v$ whose $ell_4$ norm differs from that of a Gaussian vector with the same $ell$ norm.
We show that in the regime $n rho gg sqrtN$, any spectral method from a large class (and more generally, any low-degree of the input) fails to detect the planted vector.
arXiv Detail & Related papers (2021-05-31T16:10:49Z) - Kernel Thinning [26.25415159542831]
kernel thinning is a new procedure for compressing a distribution $mathbbP$ more effectively than i.i.d. sampling or standard thinning.
We derive explicit non-asymptotic maximum mean discrepancy bounds for Gaussian, Mat'ern, and B-spline kernels.
arXiv Detail & Related papers (2021-05-12T17:56:42Z) - A Canonical Transform for Strengthening the Local $L^p$-Type Universal
Approximation Property [4.18804572788063]
$Lp$-type universal approximation theorems guarantee that a given machine learning model class $mathscrFsubseteq C(mathbbRd,mathbbRD)$ is dense in $Lp_mu(mathbbRd,mathbbRD)$.
This paper proposes a generic solution to this approximation theoretic problem by introducing a canonical transformation which "upgrades $mathscrF$'s approximation property"
arXiv Detail & Related papers (2020-06-24T17:46:35Z) - Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector.
We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.