Deep Learning in High Dimension: Neural Network Approximation of
Analytic Functions in $L^2(\mathbb{R}^d,\gamma_d)$
- URL: http://arxiv.org/abs/2111.07080v1
- Date: Sat, 13 Nov 2021 09:54:32 GMT
- Title: Deep Learning in High Dimension: Neural Network Approximation of
Analytic Functions in $L^2(\mathbb{R}^d,\gamma_d)$
- Authors: Christoph Schwab and Jakob Zech
- Abstract summary: We prove expression rates for analytic functions $f:mathbbRdtomathbbR$ in the norm of $L2(mathbbRd,gamma_d)$.
We consider in particular ReLU and ReLU$k$ activations for integer $kgeq 2$.
As an application, we prove expression rate bounds of deep ReLU-NNs for response surfaces of elliptic PDEs with log-Gaussian random field inputs.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For artificial deep neural networks, we prove expression rates for analytic
functions $f:\mathbb{R}^d\to\mathbb{R}$ in the norm of
$L^2(\mathbb{R}^d,\gamma_d)$ where $d\in {\mathbb{N}}\cup\{ \infty \}$. Here
$\gamma_d$ denotes the Gaussian product probability measure on $\mathbb{R}^d$.
We consider in particular ReLU and ReLU${}^k$ activations for integer $k\geq
2$. For $d\in\mathbb{N}$, we show exponential convergence rates in
$L^2(\mathbb{R}^d,\gamma_d)$. In case $d=\infty$, under suitable smoothness and
sparsity assumptions on $f:\mathbb{R}^{\mathbb{N}}\to\mathbb{R}$, with
$\gamma_\infty$ denoting an infinite (Gaussian) product measure on
$\mathbb{R}^{\mathbb{N}}$, we prove dimension-independent expression rate
bounds in the norm of $L^2(\mathbb{R}^{\mathbb{N}},\gamma_\infty)$. The rates
only depend on quantified holomorphy of (an analytic continuation of) the map
$f$ to a product of strips in $\mathbb{C}^d$. As an application, we prove
expression rate bounds of deep ReLU-NNs for response surfaces of elliptic PDEs
with log-Gaussian random field inputs.
Related papers
- Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ under isotropic Gaussian data.
We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ of arbitrary link function with a sample and runtime complexity of $n asymp T asymp C(q) cdot d
arXiv Detail & Related papers (2024-06-03T17:56:58Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Noncompact uniform universal approximation [0.0]
The universal approximation theorem is generalised to uniform convergence on the (noncompact) input space $mathbbRn$.
All continuous functions that vanish at infinity can be uniformly approximated by neural networks.
arXiv Detail & Related papers (2023-08-07T08:54:21Z) - An Over-parameterized Exponential Regression [18.57735939471469]
Recent developments in the field of Large Language Models (LLMs) have sparked interest in the use of exponential activation functions.
We define the neural function $F: mathbbRd times m times mathbbRd times mathbbRd times mathbbRd times mathbbRd times mathbbRd times mathbbRd times mathbbRd
arXiv Detail & Related papers (2023-03-29T07:29:07Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Beyond Independent Measurements: General Compressed Sensing with GNN
Application [4.924126492174801]
We consider the problem of recovering a structured signal $mathbfx in mathbbRn$ from noisy cone observations.
We show that the effective rank of $mathbfB$ may be used as a surrogate for the number of measurements.
arXiv Detail & Related papers (2021-10-30T20:35:56Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK [58.5766737343951]
We consider the dynamic of descent for learning a two-layer neural network.
We show that an over-parametrized two-layer neural network can provably learn with gradient loss at most ground with Tangent samples.
arXiv Detail & Related papers (2020-07-09T07:09:28Z) - A Canonical Transform for Strengthening the Local $L^p$-Type Universal
Approximation Property [4.18804572788063]
$Lp$-type universal approximation theorems guarantee that a given machine learning model class $mathscrFsubseteq C(mathbbRd,mathbbRD)$ is dense in $Lp_mu(mathbbRd,mathbbRD)$.
This paper proposes a generic solution to this approximation theoretic problem by introducing a canonical transformation which "upgrades $mathscrF$'s approximation property"
arXiv Detail & Related papers (2020-06-24T17:46:35Z) - Learning Theory for Estimation of Animal Motion Submanifolds [0.0]
This paper describes the formulation and experimental testing of a novel method for the estimation and approximation of submanifold models of animal motion.
Experiments generate a finite sets $(s_i,x_i)_i=1msubset mathbbZm$ of samples that are generated according to an unknown probability density.
arXiv Detail & Related papers (2020-03-30T20:54:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.