Related papers: Shallow neural network representation of polynomials

Shallow neural network representation of polynomials

URL: http://arxiv.org/abs/2208.08138v2
Date: Thu, 18 Aug 2022 09:35:35 GMT
Title: Shallow neural network representation of polynomials
Authors: Aleksandr Beknazaryan
Abstract summary: We show that $d$-variables of degreeR$ can be represented on $[0,1]d$ as shallow neural networks of width $d+1+sum_r=2Rbinomr+d-1d-1d-1[binomr+d-1d-1d-1[binomr+d-1d-1d-1[binomr+d-1d-1d-1d-1[binomr+d-1d-1d-1d-1
Score: 91.3755431537592
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We show that $d$-variate polynomials of degree $R$ can be represented on $[0,1]^d$ as shallow neural networks of width $d+1+\sum_{r=2}^R\binom{r+d-1}{d-1}[\binom{r+d-1}{d-1}+1]$. Also, by SNN representation of localized Taylor polynomials of univariate $C^\beta$-smooth functions, we derive for shallow networks the minimax optimal rate of convergence, up to a logarithmic factor, to unknown univariate regression function.

Related papers

Neural Networks and (Virtual) Extended Formulations [5.762677915745415]
We prove lower bounds on the size of neural networks that optimize over $P$. We show that $mathrmxc(P)$ is a lower bound on the size of any monotone or input neural network that solves the linear optimization problem over $P$.
arXiv Detail & Related papers (2024-11-05T11:12:11Z)
Approximation Rates for Shallow ReLU$^k$ Neural Networks on Sobolev Spaces via the Radon Transform [4.096453902709292]
We consider the problem of how efficiently shallow neural networks with the ReLU$k$ activation function can approximate functions from Sobolev spaces. We provide a simple proof of nearly optimal approximation rates in a variety of cases, including when $qleq p$, $pgeq 2$, and $s leq k + (d+1)/2$.
arXiv Detail & Related papers (2024-08-20T16:43:45Z)
Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations [40.77319247558742]
We study the computational complexity of learning a target function $f_*:mathbbRdtomathbbR$ with additive structure. We prove that a large subset of $f_*$ can be efficiently learned by gradient training of a two-layer neural network.
arXiv Detail & Related papers (2024-06-17T17:59:17Z)
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ under isotropic Gaussian data. We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ of arbitrary link function with a sample and runtime complexity of $n asymp T asymp C(q) cdot d
arXiv Detail & Related papers (2024-06-03T17:56:58Z)
Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
We study the problem of learning hierarchical functions over the standard Gaussian distribution with three-layer neural networks. For a large subclass of degree $k$s $p$, a three-layer neural network trained via layerwise gradientp descent on the square loss learns the target $h$ up to vanishing test error. This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
arXiv Detail & Related papers (2023-11-23T02:19:32Z)
Generalization and Stability of Interpolating Neural Networks with Minimal Width [37.908159361149835]
We investigate the generalization and optimization of shallow neural-networks trained by gradient in the interpolating regime. We prove the training loss number minimizations $m=Omega(log4 (n))$ neurons and neurons $Tapprox n$. With $m=Omega(log4 (n))$ neurons and $Tapprox n$, we bound the test loss training by $tildeO (1/)$.
arXiv Detail & Related papers (2023-02-18T05:06:15Z)
Neural Networks can Learn Representations with Gradient Descent [68.95262816363288]
In specific regimes, neural networks trained by gradient descent behave like kernel methods. In practice, it is known that neural networks strongly outperform their associated kernels.
arXiv Detail & Related papers (2022-06-30T09:24:02Z)
Expressive power of binary and ternary neural networks [91.3755431537592]
We show that deep sparse ReLU networks with ternary weights and deep ReLU networks with binary weights can approximate $beta$-H"older functions on $[0,1]d$.
arXiv Detail & Related papers (2022-06-27T13:16:08Z)
Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK [58.5766737343951]
We consider the dynamic of descent for learning a two-layer neural network. We show that an over-parametrized two-layer neural network can provably learn with gradient loss at most ground with Tangent samples.
arXiv Detail & Related papers (2020-07-09T07:09:28Z)
A Corrective View of Neural Networks: Representation, Memorization and Learning [26.87238691716307]
We develop a corrective mechanism for neural network approximation. We show that two-layer neural networks in the random features regime (RF) can memorize arbitrary labels. We also consider three-layer neural networks and show that the corrective mechanism yields faster representation rates for smooth radial functions.
arXiv Detail & Related papers (2020-02-01T20:51:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.