The necessity of depth for artificial neural networks to approximate
certain classes of smooth and bounded functions without the curse of
dimensionality
- URL: http://arxiv.org/abs/2301.08284v1
- Date: Thu, 19 Jan 2023 19:52:41 GMT
- Title: The necessity of depth for artificial neural networks to approximate
certain classes of smooth and bounded functions without the curse of
dimensionality
- Authors: Lukas Gonon and Robin Graeber and Arnulf Jentzen
- Abstract summary: We study high-dimensional approximation capacities of shallow and deep artificial neural networks (ANNs) with the rectified linear unit (ReLU) activation.
In particular, it is a key contribution of this work to reveal that for all $a,binmathbbR$ with $b-ageq 7$ we have that the functions $[a,b]dni x=(x_1,dots,x_d)mapstoprod_i=1d x_iinmathbbR$ for $d
- Score: 4.425982186154401
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this article we study high-dimensional approximation capacities of shallow
and deep artificial neural networks (ANNs) with the rectified linear unit
(ReLU) activation. In particular, it is a key contribution of this work to
reveal that for all $a,b\in\mathbb{R}$ with $b-a\geq 7$ we have that the
functions $[a,b]^d\ni x=(x_1,\dots,x_d)\mapsto\prod_{i=1}^d x_i\in\mathbb{R}$
for $d\in\mathbb{N}$ as well as the functions $[a,b]^d\ni x =(x_1,\dots,
x_d)\mapsto\sin(\prod_{i=1}^d x_i) \in \mathbb{R} $ for $ d \in \mathbb{N} $
can neither be approximated without the curse of dimensionality by means of
shallow ANNs nor insufficiently deep ANNs with ReLU activation but can be
approximated without the curse of dimensionality by sufficiently deep ANNs with
ReLU activation. We show that the product functions and the sine of the product
functions are polynomially tractable approximation problems among the
approximating class of deep ReLU ANNs with the number of hidden layers being
allowed to grow in the dimension $ d \in \mathbb{N} $. We establish the above
outlined statements not only for the product functions and the sine of the
product functions but also for other classes of target functions, in
particular, for classes of uniformly globally bounded $ C^{ \infty }
$-functions with compact support on any $[a,b]^d$ with $a\in\mathbb{R}$,
$b\in(a,\infty)$. Roughly speaking, in this work we lay open that simple
approximation problems such as approximating the sine or cosine of products
cannot be solved in standard implementation frameworks by shallow or
insufficiently deep ANNs with ReLU activation in polynomial time, but can be
approximated by sufficiently deep ReLU ANNs with the number of parameters
growing at most polynomially.
Related papers
- Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ under isotropic Gaussian data.
We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ of arbitrary link function with a sample and runtime complexity of $n asymp T asymp C(q) cdot d
arXiv Detail & Related papers (2024-06-03T17:56:58Z) - Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
We study the problem of learning hierarchical functions over the standard Gaussian distribution with three-layer neural networks.
For a large subclass of degree $k$s $p$, a three-layer neural network trained via layerwise gradientp descent on the square loss learns the target $h$ up to vanishing test error.
This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
arXiv Detail & Related papers (2023-11-23T02:19:32Z) - Deep neural networks with ReLU, leaky ReLU, and softplus activation
provably overcome the curse of dimensionality for Kolmogorov partial
differential equations with Lipschitz nonlinearities in the $L^p$-sense [3.3123773366516645]
We show that deep neural networks (DNNs) have the expressive power to approximate PDE solutions without the curse of dimensionality (COD)
It is the key contribution of this work to generalize this result by establishing this statement in the $Lp$-sense with $pin(0,infty)$.
arXiv Detail & Related papers (2023-09-24T18:58:18Z) - Geometric structure of shallow neural networks and constructive ${\mathcal L}^2$ cost minimization [1.189367612437469]
We consider shallow neural networks with one hidden layer, a ReLU activation function, an $mathcal L2$ Schatten class (or Hilbert-Schmidt) cost function.
We prove an upper bound on the minimum of the cost function of order $O(delta_P)$ where $delta_P$ measures the signal to noise ratio of training inputs.
In the special case $M=Q$, we explicitly determine an exact degenerate local minimum of the cost function, and show that the sharp value differs from the upper bound obtained for $Qleq M$ by a
arXiv Detail & Related papers (2023-09-19T07:12:41Z) - Tractability of approximation by general shallow networks [0.0]
We consider approximation of functions of the form $ xmapstoint_mathbbY G( x, y)dtau( y)$, $ xinmathbbX$, by $G$-networks of the form $ xmapsto sum_k=1n a_kG( x, y_k)$.
We obtain independent dimension bounds on the degree of approximation in terms of $n$, where also the constants involved are all dependent on the dimensions.
arXiv Detail & Related papers (2023-08-07T00:14:46Z) - Optimal Approximation Rates for Deep ReLU Neural Networks on Sobolev and Besov Spaces [2.7195102129095003]
Deep neural networks with the ReLU activation function can approximate functions in the Sobolev spaces $Ws(L_q(Omega))$ and Besov spaces $Bs_r(L_q(Omega))$.
This problem is important when studying the application of neural networks in a variety of fields.
arXiv Detail & Related papers (2022-11-25T23:32:26Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Deep Learning in High Dimension: Neural Network Approximation of
Analytic Functions in $L^2(\mathbb{R}^d,\gamma_d)$ [0.0]
We prove expression rates for analytic functions $f:mathbbRdtomathbbR$ in the norm of $L2(mathbbRd,gamma_d)$.
We consider in particular ReLU and ReLU$k$ activations for integer $kgeq 2$.
As an application, we prove expression rate bounds of deep ReLU-NNs for response surfaces of elliptic PDEs with log-Gaussian random field inputs.
arXiv Detail & Related papers (2021-11-13T09:54:32Z) - Small Covers for Near-Zero Sets of Polynomials and Learning Latent
Variable Models [56.98280399449707]
We show that there exists an $epsilon$-cover for $S$ of cardinality $M = (k/epsilon)O_d(k1/d)$.
Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models hidden variables.
arXiv Detail & Related papers (2020-12-14T18:14:08Z) - On Function Approximation in Reinforcement Learning: Optimism in the
Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning.
In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function.
Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z) - A deep network construction that adapts to intrinsic dimensionality
beyond the domain [79.23797234241471]
We study the approximation of two-layer compositions $f(x) = g(phi(x))$ via deep networks with ReLU activation.
We focus on two intuitive and practically relevant choices for $phi$: the projection onto a low-dimensional embedded submanifold and a distance to a collection of low-dimensional sets.
arXiv Detail & Related papers (2020-08-06T09:50:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.