Related papers: Optimal Neural Network Approximation for High-Dimensional Continuous Functions

Optimal Neural Network Approximation for High-Dimensional Continuous Functions

URL: http://arxiv.org/abs/2409.02363v3
Date: Thu, 06 Feb 2025 22:37:37 GMT
Title: Optimal Neural Network Approximation for High-Dimensional Continuous Functions
Authors: Ayan Maiti, Michelle Michelle, Haizhao Yang,
Abstract summary: We present a family of continuous functions that requires at least width $d$, and thus at least $d$ neurons or parameters, to achieve arbitrary accuracy in its approximation.<n>This suggests that the number of unique nonzero parameters is optimal in the sense that it grows linearly with the input dimension $d$, unlike some approximation methods where parameters may grow exponentially with $d$.
Score: 5.748690310135373
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, the authors of Shen Yang Zhang (JMLR, 2022) developed a neural network with width $36d(2d + 1)$ and depth $11$, which utilizes a special activation function called the elementary universal activation function, to achieve the super approximation property for functions in $C([a,b]^d)$. That is, the constructed network only requires a fixed number of neurons (and thus parameters) to approximate a $d$-variate continuous function on a $d$-dimensional hypercube with arbitrary accuracy. More specifically, only $\mathcal{O}(d^2)$ neurons or parameters are used. One natural question is whether we can reduce the number of these neurons or parameters in such a network. By leveraging a variant of the Kolmogorov Superposition Theorem, our analysis shows that there is a neural network generated by the elementary universal activation function with at most $10889d+10887$ unique nonzero parameters such that this super approximation property is attained. Furthermore, we present a family of continuous functions that requires at least width $d$, and thus at least $d$ neurons or parameters, to achieve arbitrary accuracy in its approximation. This suggests that the number of unique nonzero parameters is optimal in the sense that it grows linearly with the input dimension $d$, unlike some approximation methods where parameters may grow exponentially with $d$.

Related papers

Nonlocal techniques for the analysis of deep ReLU neural network approximations [0.0]
Recently, Daubechies, DeVore, Foucart, Hanin, and Petrova introduced a system of piece-wise linear functions. We show that this system serves as a Riesz basis also for Sobolev spaces and Barron classes.
arXiv Detail & Related papers (2025-04-07T09:00:22Z)
Dimension-independent learning rates for high-dimensional classification problems [53.622581586464634]
We show that every $RBV2$ function can be approximated by a neural network with bounded weights. We then prove the existence of a neural network with bounded weights approximating a classification function.
arXiv Detail & Related papers (2024-09-26T16:02:13Z)
Interpolation with deep neural networks with non-polynomial activations: necessary and sufficient numbers of neurons [0.0]
We prove that $Theta(sqrtnd')$ neurons are sufficient as long as the activation function is real at a point and not a point and not a there. This means that activation functions can be freely chosen in a problem-dependent manner without loss of power.
arXiv Detail & Related papers (2024-05-22T15:29:45Z)
Multi-layer random features and the approximation power of neural networks [4.178980693837599]
We prove that a reproducing kernel Hilbert space contains only functions that can be approximated by the architecture. We show that if eigenvalues of the integral operator of the NNGP decay slower than $k-n-frac23$ where $k$ is an order of an eigenvalue, our theorem guarantees a more succinct neural network approximation than Barron's theorem.
arXiv Detail & Related papers (2024-04-26T14:57:56Z)
Should Under-parameterized Student Networks Copy or Average Teacher Weights? [7.777410338143785]
We consider the case when $f*$ itself is a neural network with one hidden layer and $k$ neurons. As the student has fewer neurons than the teacher, it is unclear whether each of the $n$ student neurons should copy one of the teacher neurons or rather average a group of teacher neurons. We find for the erf activation function that flow gradient converges either to the optimal copy-average critical point or to another point where each student neuron approximately copies a different teacher neuron.
arXiv Detail & Related papers (2023-11-03T00:21:36Z)
Achieve the Minimum Width of Neural Networks for Universal Approximation [1.52292571922932]
We study the exact minimum width, $w_min$, for the universal approximation property (UAP) of neural networks. In particular, the critical width, $w*_min$, for $Lp$-UAP can be achieved by leaky-ReLU networks.
arXiv Detail & Related papers (2022-09-23T04:03:50Z)
Shallow neural network representation of polynomials [91.3755431537592]
We show that $d$-variables of degreeR$ can be represented on $[0,1]d$ as shallow neural networks of width $d+1+sum_r=2Rbinomr+d-1d-1d-1[binomr+d-1d-1d-1[binomr+d-1d-1d-1[binomr+d-1d-1d-1d-1[binomr+d-1d-1d-1d-1
arXiv Detail & Related papers (2022-08-17T08:14:52Z)
Deep Network Approximation: Achieving Arbitrary Accuracy with Fixed Number of Neurons [5.37133760455631]
We develop feed-forward neural networks that achieve the universal approximation property for all continuous functions with a fixed finite number of neurons. We prove that $sigma$-activated networks with width $36d(2d+1)$ and depth $11$ can approximate any continuous function on a $d$-dimensioanl hypercube within an arbitrarily small error.
arXiv Detail & Related papers (2021-07-06T05:24:30Z)
Neural networks with superexpressive activations and integer weights [91.3755431537592]
An example of an activation function $sigma$ is given such that networks with activations $sigma, lfloorcdotrfloor$, integer weights and a fixed architecture is given. The range of integer weights required for $varepsilon$-approximation of H"older continuous functions is derived.
arXiv Detail & Related papers (2021-05-20T17:29:08Z)
Deep neural network approximation of analytic functions [91.3755431537592]
entropy bound for the spaces of neural networks with piecewise linear activation functions. We derive an oracle inequality for the expected error of the considered penalized deep neural network estimators.
arXiv Detail & Related papers (2021-04-05T18:02:04Z)
Function approximation by deep neural networks with parameters $\{0,\pm \frac{1}{2}, \pm 1, 2\}$ [91.3755431537592]
It is shown that $C_beta$-smooth functions can be approximated by neural networks with parameters $0,pm frac12, pm 1, 2$. The depth, width and the number of active parameters of constructed networks have, up to a logarithimc factor, the same dependence on the approximation error as the networks with parameters in $[-1,1]$.
arXiv Detail & Related papers (2021-03-15T19:10:02Z)
Size and Depth Separation in Approximating Natural Functions with Neural Networks [52.73592689730044]
We show the benefits of size and depth for approximation of natural functions with ReLU networks. We show a complexity-theoretic barrier to proving such results beyond size $O(d)$. We also show an explicit natural function, that can be approximated with networks of size $O(d)$.
arXiv Detail & Related papers (2021-01-30T21:30:11Z)
Finding Global Minima via Kernel Approximations [90.42048080064849]
We consider the global minimization of smooth functions based solely on function evaluations. In this paper, we consider an approach that jointly models the function to approximate and finds a global minimum.
arXiv Detail & Related papers (2020-12-22T12:59:30Z)
Nonclosedness of Sets of Neural Networks in Sobolev Spaces [0.0]
We show that realized neural networks are not closed in order-$(m-1)$ Sobolev spaces $Wm-1,p$ for $p in [1,infty]$. For a real analytic activation function, we show that sets of realized neural networks are not closed in $Wk,p$ for any $k in mathbbN$.
arXiv Detail & Related papers (2020-07-23T00:57:25Z)
Interval Universal Approximation for Neural Networks [47.767793120249095]
We introduce the interval universal approximation (IUA) theorem. IUA shows that neural networks can approximate any continuous function $f$ as we have known for decades. We study the computational complexity of constructing neural networks that are amenable to precise interval analysis.
arXiv Detail & Related papers (2020-07-12T20:43:56Z)
Deep Polynomial Neural Networks [77.70761658507507]
$Pi$Nets are a new class of function approximators based on expansions. $Pi$Nets produce state-the-art results in three challenging tasks, i.e. image generation, face verification and 3D mesh representation learning.
arXiv Detail & Related papers (2020-06-20T16:23:32Z)
On the Modularity of Hypernetworks [103.1147622394852]
We show that for a structured target function, the overall number of trainable parameters in a hypernetwork is smaller by orders of magnitude than the number of trainable parameters of a standard neural network and an embedding method.
arXiv Detail & Related papers (2020-02-23T22:51:52Z)
A Corrective View of Neural Networks: Representation, Memorization and Learning [26.87238691716307]
We develop a corrective mechanism for neural network approximation. We show that two-layer neural networks in the random features regime (RF) can memorize arbitrary labels. We also consider three-layer neural networks and show that the corrective mechanism yields faster representation rates for smooth radial functions.
arXiv Detail & Related papers (2020-02-01T20:51:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.