Size and Depth Separation in Approximating Natural Functions with Neural
Networks
- URL: http://arxiv.org/abs/2102.00314v2
- Date: Wed, 3 Feb 2021 01:51:12 GMT
- Title: Size and Depth Separation in Approximating Natural Functions with Neural
Networks
- Authors: Gal Vardi, Daniel Reichman, Toniann Pitassi, Ohad Shamir
- Abstract summary: We show the benefits of size and depth for approximation of natural functions with ReLU networks.
We show a complexity-theoretic barrier to proving such results beyond size $O(d)$.
We also show an explicit natural function, that can be approximated with networks of size $O(d)$.
- Score: 52.73592689730044
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When studying the expressive power of neural networks, a main challenge is to
understand how the size and depth of the network affect its ability to
approximate real functions. However, not all functions are interesting from a
practical viewpoint: functions of interest usually have a polynomially-bounded
Lipschitz constant, and can be computed efficiently. We call functions that
satisfy these conditions "natural", and explore the benefits of size and depth
for approximation of natural functions with ReLU networks. As we show, this
problem is more challenging than the corresponding problem for non-natural
functions. We give barriers to showing depth-lower-bounds: Proving existence of
a natural function that cannot be approximated by polynomial-size networks of
depth $4$ would settle longstanding open problems in computational complexity.
It implies that beyond depth $4$ there is a barrier to showing depth-separation
for natural functions, even between networks of constant depth and networks of
nonconstant depth. We also study size-separation, namely, whether there are
natural functions that can be approximated with networks of size $O(s(d))$, but
not with networks of size $O(s'(d))$. We show a complexity-theoretic barrier to
proving such results beyond size $O(d\log^2(d))$, but also show an explicit
natural function, that can be approximated with networks of size $O(d)$ and not
with networks of size $o(d/\log d)$. For approximation in $L_\infty$ we achieve
such separation already between size $O(d)$ and size $o(d)$. Moreover, we show
superpolynomial size lower bounds and barriers to such lower bounds, depending
on the assumptions on the function. Our size-separation results rely on an
analysis of size lower bounds for Boolean functions, which is of independent
interest: We show linear size lower bounds for computing explicit Boolean
functions with neural networks and threshold circuits.
Related papers
- Neural Networks and (Virtual) Extended Formulations [5.762677915745415]
We make a step towards proving lower bounds on the size of neural networks by linking their representative capabilities to the notion of the extension complexity $mathrmxc(P)$.
We show that powerful results on the ordinary extension complexity can be converted into lower bounds for monotone neural networks.
arXiv Detail & Related papers (2024-11-05T11:12:11Z) - Expressivity and Approximation Properties of Deep Neural Networks with
ReLU$^k$ Activation [2.3020018305241337]
We investigate the expressivity and approximation properties of deep networks employing the ReLU$k$ activation function for $k geq 2$.
Although deep ReLU$k$ networks can approximates effectively, deep ReLU$k$ networks have the capability to represent higher-degrees precisely.
arXiv Detail & Related papers (2023-12-27T09:11:14Z) - Understanding Deep Neural Function Approximation in Reinforcement
Learning via $\epsilon$-Greedy Exploration [53.90873926758026]
This paper provides a theoretical study of deep neural function approximation in reinforcement learning (RL)
We focus on the value based algorithm with the $epsilon$-greedy exploration via deep (and two-layer) neural networks endowed by Besov (and Barron) function spaces.
Our analysis reformulates the temporal difference error in an $L2(mathrmdmu)$-integrable space over a certain averaged measure $mu$, and transforms it to a generalization problem under the non-iid setting.
arXiv Detail & Related papers (2022-09-15T15:42:47Z) - Theory of Deep Convolutional Neural Networks III: Approximating Radial
Functions [7.943024117353317]
We consider a family of deep neural networks consisting of two groups of convolutional layers, a down operator, and a fully connected layer.
The network structure depends on two structural parameters which determine the numbers of convolutional layers and the width of the fully connected layer.
arXiv Detail & Related papers (2021-07-02T08:22:12Z) - Deep neural network approximation of analytic functions [91.3755431537592]
entropy bound for the spaces of neural networks with piecewise linear activation functions.
We derive an oracle inequality for the expected error of the considered penalized deep neural network estimators.
arXiv Detail & Related papers (2021-04-05T18:02:04Z) - A deep network construction that adapts to intrinsic dimensionality
beyond the domain [79.23797234241471]
We study the approximation of two-layer compositions $f(x) = g(phi(x))$ via deep networks with ReLU activation.
We focus on two intuitive and practically relevant choices for $phi$: the projection onto a low-dimensional embedded submanifold and a distance to a collection of low-dimensional sets.
arXiv Detail & Related papers (2020-08-06T09:50:29Z) - Interval Universal Approximation for Neural Networks [47.767793120249095]
We introduce the interval universal approximation (IUA) theorem.
IUA shows that neural networks can approximate any continuous function $f$ as we have known for decades.
We study the computational complexity of constructing neural networks that are amenable to precise interval analysis.
arXiv Detail & Related papers (2020-07-12T20:43:56Z) - Neural Networks with Small Weights and Depth-Separation Barriers [40.66211670342284]
For constant depths, existing results are limited to depths $2$ and $3$, and achieving results for higher depths has been an important question.
We focus on feedforward ReLU networks, and prove fundamental barriers to proving such results beyond depth $4$.
arXiv Detail & Related papers (2020-05-31T21:56:17Z) - On the Modularity of Hypernetworks [103.1147622394852]
We show that for a structured target function, the overall number of trainable parameters in a hypernetwork is smaller by orders of magnitude than the number of trainable parameters of a standard neural network and an embedding method.
arXiv Detail & Related papers (2020-02-23T22:51:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.