Related papers: Deep Learning in High Dimension: Neural Network Approximation of Analytic Functions in $L^2(\mathbb{R}^d,\gamma

Deep Learning in High Dimension: Neural Network Approximation of Analytic Functions in $L^2(\mathbb{R}^d,\gamma_d)$

URL: http://arxiv.org/abs/2111.07080v1
Date: Sat, 13 Nov 2021 09:54:32 GMT
Title: Deep Learning in High Dimension: Neural Network Approximation of Analytic Functions in $L^2(\mathbb{R}^d,\gamma_d)$
Authors: Christoph Schwab and Jakob Zech
Abstract summary: We prove expression rates for analytic functions $f:mathbbRdtomathbbR$ in the norm of $L2(mathbbRd,gamma_d)$. We consider in particular ReLU and ReLU$k$ activations for integer $kgeq 2$. As an application, we prove expression rate bounds of deep ReLU-NNs for response surfaces of elliptic PDEs with log-Gaussian random field inputs.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: For artificial deep neural networks, we prove expression rates for analytic functions $f:\mathbb{R}^d\to\mathbb{R}$ in the norm of $L^2(\mathbb{R}^d,\gamma_d)$ where $d\in {\mathbb{N}}\cup\{ \infty \}$. Here $\gamma_d$ denotes the Gaussian product probability measure on $\mathbb{R}^d$. We consider in particular ReLU and ReLU${}^k$ activations for integer $k\geq 2$. For $d\in\mathbb{N}$, we show exponential convergence rates in $L^2(\mathbb{R}^d,\gamma_d)$. In case $d=\infty$, under suitable smoothness and sparsity assumptions on $f:\mathbb{R}^{\mathbb{N}}\to\mathbb{R}$, with $\gamma_\infty$ denoting an infinite (Gaussian) product measure on $\mathbb{R}^{\mathbb{N}}$, we prove dimension-independent expression rate bounds in the norm of $L^2(\mathbb{R}^{\mathbb{N}},\gamma_\infty)$. The rates only depend on quantified holomorphy of (an analytic continuation of) the map $f$ to a product of strips in $\mathbb{C}^d$. As an application, we prove expression rate bounds of deep ReLU-NNs for response surfaces of elliptic PDEs with log-Gaussian random field inputs.

Related papers

Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ under isotropic Gaussian data. We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ of arbitrary link function with a sample and runtime complexity of $n asymp T asymp C(q) cdot d
arXiv Detail & Related papers (2024-06-03T17:56:58Z)
Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models. In this work, we initiate the study of provably learning a multi-head attention layer from random examples. We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z)
A Unified Framework for Uniform Signal Recovery in Nonlinear Generative Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously. Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples. We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z)
Noncompact uniform universal approximation [0.0]
The universal approximation theorem is generalised to uniform convergence on the (noncompact) input space $mathbbRn$. All continuous functions that vanish at infinity can be uniformly approximated by neural networks.
arXiv Detail & Related papers (2023-08-07T08:54:21Z)
An Over-parameterized Exponential Regression [18.57735939471469]
Recent developments in the field of Large Language Models (LLMs) have sparked interest in the use of exponential activation functions. We define the neural function $F: mathbbRd times m times mathbbRd times mathbbRd times mathbbRd times mathbbRd times mathbbRd times mathbbRd times mathbbRd
arXiv Detail & Related papers (2023-03-29T07:29:07Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations. The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z)
Beyond Independent Measurements: General Compressed Sensing with GNN Application [4.924126492174801]
We consider the problem of recovering a structured signal $mathbfx in mathbbRn$ from noisy cone observations. We show that the effective rank of $mathbfB$ may be used as a surrogate for the number of measurements.
arXiv Detail & Related papers (2021-10-30T20:35:56Z)
Random matrices in service of ML footprint: ternary random features with no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$. We propose a novel random technique, called Ternary Random Feature (TRF) The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z)
Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK [58.5766737343951]
We consider the dynamic of descent for learning a two-layer neural network. We show that an over-parametrized two-layer neural network can provably learn with gradient loss at most ground with Tangent samples.
arXiv Detail & Related papers (2020-07-09T07:09:28Z)
A Canonical Transform for Strengthening the Local $L^p$-Type Universal Approximation Property [4.18804572788063]
$Lp$-type universal approximation theorems guarantee that a given machine learning model class $mathscrFsubseteq C(mathbbRd,mathbbRD)$ is dense in $Lp_mu(mathbbRd,mathbbRD)$. This paper proposes a generic solution to this approximation theoretic problem by introducing a canonical transformation which "upgrades $mathscrF$'s approximation property"
arXiv Detail & Related papers (2020-06-24T17:46:35Z)
Learning Theory for Estimation of Animal Motion Submanifolds [0.0]
This paper describes the formulation and experimental testing of a novel method for the estimation and approximation of submanifold models of animal motion. Experiments generate a finite sets $(s_i,x_i)_i=1msubset mathbbZm$ of samples that are generated according to an unknown probability density.
arXiv Detail & Related papers (2020-03-30T20:54:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.