Related papers: Deep neural network approximation of analytic functions

Deep neural network approximation of analytic functions

URL: http://arxiv.org/abs/2104.02095v1
Date: Mon, 5 Apr 2021 18:02:04 GMT
Title: Deep neural network approximation of analytic functions
Authors: Aleksandr Beknazaryan
Abstract summary: entropy bound for the spaces of neural networks with piecewise linear activation functions. We derive an oracle inequality for the expected error of the considered penalized deep neural network estimators.
Score: 91.3755431537592
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We provide an entropy bound for the spaces of neural networks with piecewise linear activation functions, such as the ReLU and the absolute value functions. This bound generalizes the known entropy bound for the space of linear functions on $\mathbb{R}^d$ and it depends on the value at the point $(1,1,...,1)$ of the networks obtained by taking the absolute values of all parameters of original networks. Keeping this value together with the depth, width and the parameters of the networks to have logarithmic dependence on $1/\varepsilon$, we $\varepsilon$-approximate functions that are analytic on certain regions of $\mathbb{C}^d$. As a statistical application we derive an oracle inequality for the expected error of the considered penalized deep neural network estimators.

Related papers

Nonlocal techniques for the analysis of deep ReLU neural network approximations [0.0]
Recently, Daubechies, DeVore, Foucart, Hanin, and Petrova introduced a system of piece-wise linear functions. We show that this system serves as a Riesz basis also for Sobolev spaces and Barron classes.
arXiv Detail & Related papers (2025-04-07T09:00:22Z)
Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization [66.03821840425539]
In this paper, we investigate the training dynamics of $L$-layer neural networks using the tensor gradient program (SGD) framework. We show that SGD enables these networks to learn linearly independent features that substantially deviate from their initial values. This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum.
arXiv Detail & Related papers (2025-03-12T17:33:13Z)
Constructive Universal Approximation and Finite Sample Memorization by Narrow Deep ReLU Networks [0.0]
We show that any dataset with $N$ distinct points in $mathbbRd$ and $M$ output classes can be exactly classified.<n>We also prove a universal approximation theorem in $Lp(Omega; mathbbRm)$ for any bounded domain.<n>Our results offer a unified and interpretable framework connecting controllability, expressivity, and training dynamics in deep neural networks.
arXiv Detail & Related papers (2024-09-10T14:31:21Z)
Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks. In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z)
A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks. We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks. Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z)
Shallow neural network representation of polynomials [91.3755431537592]
We show that $d$-variables of degreeR$ can be represented on $[0,1]d$ as shallow neural networks of width $d+1+sum_r=2Rbinomr+d-1d-1d-1[binomr+d-1d-1d-1[binomr+d-1d-1d-1[binomr+d-1d-1d-1d-1[binomr+d-1d-1d-1d-1
arXiv Detail & Related papers (2022-08-17T08:14:52Z)
Scalable Lipschitz Residual Networks with Convex Potential Flows [120.27516256281359]
We show that using convex potentials in a residual network gradient flow provides a built-in $1$-Lipschitz transformation. A comprehensive set of experiments on CIFAR-10 demonstrates the scalability of our architecture and the benefit of our approach for $ell$ provable defenses.
arXiv Detail & Related papers (2021-10-25T07:12:53Z)
Theory of Deep Convolutional Neural Networks III: Approximating Radial Functions [7.943024117353317]
We consider a family of deep neural networks consisting of two groups of convolutional layers, a down operator, and a fully connected layer. The network structure depends on two structural parameters which determine the numbers of convolutional layers and the width of the fully connected layer.
arXiv Detail & Related papers (2021-07-02T08:22:12Z)
Neural networks with superexpressive activations and integer weights [91.3755431537592]
An example of an activation function $sigma$ is given such that networks with activations $sigma, lfloorcdotrfloor$, integer weights and a fixed architecture is given. The range of integer weights required for $varepsilon$-approximation of H"older continuous functions is derived.
arXiv Detail & Related papers (2021-05-20T17:29:08Z)
Function approximation by deep neural networks with parameters $\{0,\pm \frac{1}{2}, \pm 1, 2\}$ [91.3755431537592]
It is shown that $C_beta$-smooth functions can be approximated by neural networks with parameters $0,pm frac12, pm 1, 2$. The depth, width and the number of active parameters of constructed networks have, up to a logarithimc factor, the same dependence on the approximation error as the networks with parameters in $[-1,1]$.
arXiv Detail & Related papers (2021-03-15T19:10:02Z)
Sample Complexity and Overparameterization Bounds for Projection-Free Neural TD Learning [38.730333068555275]
Existing analysis of neural TD learning relies on either infinite width-analysis or constraining the network parameters in a (random) compact set. We show that the projection-free TD learning equipped with a two-layer ReLU network of any width exceeding $poly(overlinenu,1/epsilon)$ converges to the true value function with error $epsilon$ given $poly(overlinenu,1/epsilon)$ iterations or samples.
arXiv Detail & Related papers (2021-03-02T01:05:19Z)
Theory of Deep Convolutional Neural Networks II: Spherical Analysis [9.099589602551573]
We consider a family of deep convolutional neural networks applied to approximate functions on the unit sphere $mathbbSd-1$ of $mathbbRd$. Our analysis presents rates of uniform approximation when the approximated function lies in the Sobolev space $Wr_infty (mathbbSd-1)$ with $r>0$ or takes an additive ridge form.
arXiv Detail & Related papers (2020-07-28T14:54:30Z)
Nonclosedness of Sets of Neural Networks in Sobolev Spaces [0.0]
We show that realized neural networks are not closed in order-$(m-1)$ Sobolev spaces $Wm-1,p$ for $p in [1,infty]$. For a real analytic activation function, we show that sets of realized neural networks are not closed in $Wk,p$ for any $k in mathbbN$.
arXiv Detail & Related papers (2020-07-23T00:57:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.