Deep Neural Networks: Multi-Classification and Universal Approximation
- URL: http://arxiv.org/abs/2409.06555v1
- Date: Tue, 10 Sep 2024 14:31:21 GMT
- Title: Deep Neural Networks: Multi-Classification and Universal Approximation
- Authors: Martín Hernández, Enrique Zuazua,
- Abstract summary: We demonstrate that a ReLU deep neural network with a width of $2$ and a depth of $2N+4M-1$ layers can achieve finite sample memorization for any dataset comprising $N$ elements.
We also provide depth estimates for approximating $W1,p$ functions and width estimates for approximating $Lp(Omega;mathbbRm)$ for $mgeq1$.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We demonstrate that a ReLU deep neural network with a width of $2$ and a depth of $2N+4M-1$ layers can achieve finite sample memorization for any dataset comprising $N$ elements in $\mathbb{R}^d$, where $d\ge1,$ and $M$ classes, thereby ensuring accurate classification. By modeling the neural network as a time-discrete nonlinear dynamical system, we interpret the memorization property as a problem of simultaneous or ensemble controllability. This problem is addressed by constructing the network parameters inductively and explicitly, bypassing the need for training or solving any optimization problem. Additionally, we establish that such a network can achieve universal approximation in $L^p(\Omega;\mathbb{R}_+)$, where $\Omega$ is a bounded subset of $\mathbb{R}^d$ and $p\in[1,\infty)$, using a ReLU deep neural network with a width of $d+1$. We also provide depth estimates for approximating $W^{1,p}$ functions and width estimates for approximating $L^p(\Omega;\mathbb{R}^m)$ for $m\geq1$. Our proofs are constructive, offering explicit values for the biases and weights involved.
Related papers
- New advances in universal approximation with neural networks of minimal width [4.424170214926035]
We show that autoencoders with leaky ReLU activations are universal approximators of $Lp$ functions.
We broaden our results to show that smooth invertible neural networks can approximate $Lp(mathbbRd,mathbbRd)$ on compacta.
arXiv Detail & Related papers (2024-11-13T16:17:16Z) - Implicit Hypersurface Approximation Capacity in Deep ReLU Networks [0.0]
We develop a geometric approximation theory for deep feed-forward neural networks with ReLU activations.
We show that a deep fully-connected ReLU network of width $d+1$ can implicitly construct an approximation as its zero contour.
arXiv Detail & Related papers (2024-07-04T11:34:42Z) - Bayesian Inference with Deep Weakly Nonlinear Networks [57.95116787699412]
We show at a physics level of rigor that Bayesian inference with a fully connected neural network is solvable.
We provide techniques to compute the model evidence and posterior to arbitrary order in $1/N$ and at arbitrary temperature.
arXiv Detail & Related papers (2024-05-26T17:08:04Z) - Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
We study the problem of learning hierarchical functions over the standard Gaussian distribution with three-layer neural networks.
For a large subclass of degree $k$s $p$, a three-layer neural network trained via layerwise gradientp descent on the square loss learns the target $h$ up to vanishing test error.
This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
arXiv Detail & Related papers (2023-11-23T02:19:32Z) - Rates of Approximation by ReLU Shallow Neural Networks [8.22379888383833]
We show that ReLU shallow neural networks with $m$ hidden neurons can uniformly approximate functions from the H"older space.
Such rates are very close to the optimal one $O(m-fracrd)$ in the sense that $fracd+2d+4d+4$ is close to $1$, when the dimension $d$ is large.
arXiv Detail & Related papers (2023-07-24T00:16:50Z) - Most Neural Networks Are Almost Learnable [52.40331776572531]
We show that for any fixed $epsilon>0$ and depth $i$, there is a poly-time algorithm that learns random Xavier networks of depth $i$.
The algorithm runs in time and sample complexity of $(bard)mathrmpoly(epsilon-1)$, where $bar d$ is the size of the network.
For some cases of sigmoid and ReLU-like activations the bound can be improved to $(bard)mathrmpolylog(eps
arXiv Detail & Related papers (2023-05-25T22:27:42Z) - Generalization Ability of Wide Neural Networks on $\mathbb{R}$ [8.508360765158326]
We study the generalization ability of the wide two-layer ReLU neural network on $mathbbR$.
We show that: $i)$ when the width $mrightarrowinfty$, the neural network kernel (NNK) uniformly converges to the NTK; $ii)$ the minimax rate of regression over the RKHS associated to $K_1$ is $n-2/3$; $iii)$ if one adopts the early stopping strategy in training a wide neural network, the resulting neural network achieves the minimax rate; $iv
arXiv Detail & Related papers (2023-02-12T15:07:27Z) - Achieve the Minimum Width of Neural Networks for Universal Approximation [1.52292571922932]
We study the exact minimum width, $w_min$, for the universal approximation property (UAP) of neural networks.
In particular, the critical width, $w*_min$, for $Lp$-UAP can be achieved by leaky-ReLU networks.
arXiv Detail & Related papers (2022-09-23T04:03:50Z) - Understanding Deep Neural Function Approximation in Reinforcement
Learning via $\epsilon$-Greedy Exploration [53.90873926758026]
This paper provides a theoretical study of deep neural function approximation in reinforcement learning (RL)
We focus on the value based algorithm with the $epsilon$-greedy exploration via deep (and two-layer) neural networks endowed by Besov (and Barron) function spaces.
Our analysis reformulates the temporal difference error in an $L2(mathrmdmu)$-integrable space over a certain averaged measure $mu$, and transforms it to a generalization problem under the non-iid setting.
arXiv Detail & Related papers (2022-09-15T15:42:47Z) - On the Optimal Memorization Power of ReLU Neural Networks [53.15475693468925]
We show that feedforward ReLU neural networks can memorization any $N$ points that satisfy a mild separability assumption.
We prove that having such a large bit complexity is both necessary and sufficient for memorization with a sub-linear number of parameters.
arXiv Detail & Related papers (2021-10-07T05:25:23Z) - Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK [58.5766737343951]
We consider the dynamic of descent for learning a two-layer neural network.
We show that an over-parametrized two-layer neural network can provably learn with gradient loss at most ground with Tangent samples.
arXiv Detail & Related papers (2020-07-09T07:09:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.