Deep Operator Network Approximation Rates for Lipschitz Operators
- URL: http://arxiv.org/abs/2307.09835v1
- Date: Wed, 19 Jul 2023 08:46:47 GMT
- Title: Deep Operator Network Approximation Rates for Lipschitz Operators
- Authors: Christoph Schwab, Andreas Stein and Jakob Zech
- Abstract summary: We establish universality and expression rate bounds for a class of neural Deep Operator Networks (DON) emulating Lipschitz (or H"older) continuous maps.
The DON architecture considered uses linear encoders $mathcal E$ and decoders $mathcal D$ via (biorthogonal) Riesz bases of $mathcal X$, $mathcal Y$.
Key in the proof of the present expression rate bounds is the use of either super-expressive activations.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We establish universality and expression rate bounds for a class of neural
Deep Operator Networks (DON) emulating Lipschitz (or H\"older) continuous maps
$\mathcal G:\mathcal X\to\mathcal Y$ between (subsets of) separable Hilbert
spaces $\mathcal X$, $\mathcal Y$. The DON architecture considered uses linear
encoders $\mathcal E$ and decoders $\mathcal D$ via (biorthogonal) Riesz bases
of $\mathcal X$, $\mathcal Y$, and an approximator network of an
infinite-dimensional, parametric coordinate map that is Lipschitz continuous on
the sequence space $\ell^2(\mathbb N)$. Unlike previous works ([Herrmann,
Schwab and Zech: Neural and Spectral operator surrogates: construction and
expression rate bounds, SAM Report, 2022], [Marcati and Schwab: Exponential
Convergence of Deep Operator Networks for Elliptic Partial Differential
Equations, SAM Report, 2022]), which required for example $\mathcal G$ to be
holomorphic, the present expression rate results require mere Lipschitz (or
H\"older) continuity of $\mathcal G$. Key in the proof of the present
expression rate bounds is the use of either super-expressive activations (e.g.
[Yarotski: Elementary superexpressive activations, Int. Conf. on ML, 2021],
[Shen, Yang and Zhang: Neural network approximation: Three hidden layers are
enough, Neural Networks, 2021], and the references there) which are inspired by
the Kolmogorov superposition theorem, or of nonstandard NN architectures with
standard (ReLU) activations as recently proposed in [Zhang, Shen and Yang:
Neural Network Architecture Beyond Width and Depth, Adv. in Neural Inf. Proc.
Sys., 2022]. We illustrate the abstract results by approximation rate bounds
for emulation of a) solution operators for parametric elliptic variational
inequalities, and b) Lipschitz maps of Hilbert-Schmidt operators.
Related papers
- New advances in universal approximation with neural networks of minimal width [4.424170214926035]
We show that autoencoders with leaky ReLU activations are universal approximators of $Lp$ functions.
We broaden our results to show that smooth invertible neural networks can approximate $Lp(mathbbRd,mathbbRd)$ on compacta.
arXiv Detail & Related papers (2024-11-13T16:17:16Z) - Operator Learning with Gaussian Processes [0.18641315013048293]
Operator learning focuses on approximating mappings $mathcalGdagger:mathcalU rightarrowmathcalV$ between infinite-dimensional spaces of functions.
We introduce a hybrid GP/NN-based framework for operator learning that leverages the strengths of both methods.
arXiv Detail & Related papers (2024-09-06T18:06:08Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
We study the problem of learning hierarchical functions over the standard Gaussian distribution with three-layer neural networks.
For a large subclass of degree $k$s $p$, a three-layer neural network trained via layerwise gradientp descent on the square loss learns the target $h$ up to vanishing test error.
This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
arXiv Detail & Related papers (2023-11-23T02:19:32Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Kernel Methods are Competitive for Operator Learning [1.4132765964347058]
We present a kernel-based framework for learning operators between Banach spaces along with a priori error analysis.
We show that, even when using vanilla kernels, our approach is competitive in terms of cost-accuracy trade-off.
arXiv Detail & Related papers (2023-04-26T00:07:59Z) - An Approximation Theory for Metric Space-Valued Functions With A View
Towards Deep Learning [25.25903127886586]
We build universal functions approximators of continuous maps between arbitrary Polish metric spaces $mathcalX$ and $mathcalY$.
In particular, we show that the required number of Dirac measures is determined by the structure of $mathcalX$ and $mathcalY$.
arXiv Detail & Related papers (2023-04-24T16:18:22Z) - Quantum and classical low-degree learning via a dimension-free Remez
inequality [52.12931955662553]
We show a new way to relate functions on the hypergrid to their harmonic extensions over the polytorus.
We show the supremum of a function $f$ over products of the cyclic group $exp(2pi i k/K)_k=1K$.
We extend to new spaces a recent line of work citeEI22, CHP, VZ22 that gave similarly efficient methods for learning low-degrees on hypercubes and observables on qubits.
arXiv Detail & Related papers (2023-01-04T04:15:40Z) - On the Sample Complexity of Two-Layer Networks: Lipschitz vs.
Element-Wise Lipschitz Activation [20.70453775428433]
We investigate the sample complexity of bounded two-layer neural networks using different activation functions.
We prove that if $sigma$ is element-wise, then the sample complexity of $mathcalH$ has only logarithmic dependency in width.
arXiv Detail & Related papers (2022-11-17T16:27:15Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.