How Many Neurons Does it Take to Approximate the Maximum?
- URL: http://arxiv.org/abs/2307.09212v2
- Date: Tue, 7 Nov 2023 17:50:27 GMT
- Title: How Many Neurons Does it Take to Approximate the Maximum?
- Authors: Itay Safran, Daniel Reichman, Paul Valiant
- Abstract summary: We study the size of a neural network needed to approximate the maximum function over $d$ inputs.
We provide new lower and upper bounds on the width required for approximation across various depths.
- Score: 10.995895410470279
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the size of a neural network needed to approximate the maximum
function over $d$ inputs, in the most basic setting of approximating with
respect to the $L_2$ norm, for continuous distributions, for a network that
uses ReLU activations. We provide new lower and upper bounds on the width
required for approximation across various depths. Our results establish new
depth separations between depth 2 and 3, and depth 3 and 5 networks, as well as
providing a depth $\mathcal{O}(\log(\log(d)))$ and width $\mathcal{O}(d)$
construction which approximates the maximum function. Our depth separation
results are facilitated by a new lower bound for depth 2 networks approximating
the maximum function over the uniform distribution, assuming an exponential
upper bound on the size of the weights. Furthermore, we are able to use this
depth 2 lower bound to provide tight bounds on the number of neurons needed to
approximate the maximum by a depth 3 network. Our lower bounds are of
potentially broad interest as they apply to the widely studied and used
\emph{max} function, in contrast to many previous results that base their
bounds on specially constructed or pathological functions and distributions.
Related papers
- Bayesian Inference with Deep Weakly Nonlinear Networks [57.95116787699412]
We show at a physics level of rigor that Bayesian inference with a fully connected neural network is solvable.
We provide techniques to compute the model evidence and posterior to arbitrary order in $1/N$ and at arbitrary temperature.
arXiv Detail & Related papers (2024-05-26T17:08:04Z) - Depth Separations in Neural Networks: Separating the Dimension from the Accuracy [9.783697404304027]
We prove an exponential size separation between depth 2 and depth 3 neural networks (with real inputs)
We show that the curse of dimensionality manifests itself in depth 2 approximation, even in cases where the target function can be represented efficiently using a depth 3 network.
arXiv Detail & Related papers (2024-02-11T17:27:26Z) - Neural networks: deep, shallow, or in between? [0.6043356028687779]
We give estimates for the error of approximation of a compact subset from a Banach space by the outputs of feed-forward neural networks with width W, depth l and Lipschitz activation functions.
We show that, modulo logarithmic factors, rates better that entropy numbers' rates are possibly attainable only for neural networks for which the depth l goes to infinity.
arXiv Detail & Related papers (2023-10-11T04:50:28Z) - Polynomial Width is Sufficient for Set Representation with
High-dimensional Features [69.65698500919869]
DeepSets is the most widely used neural network architecture for set representation.
We present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE)
arXiv Detail & Related papers (2023-07-08T16:00:59Z) - Data Topology-Dependent Upper Bounds of Neural Network Widths [52.58441144171022]
We first show that a three-layer neural network can be designed to approximate an indicator function over a compact set.
This is then extended to a simplicial complex, deriving width upper bounds based on its topological structure.
We prove the universal approximation property of three-layer ReLU networks using our topological approach.
arXiv Detail & Related papers (2023-05-25T14:17:15Z) - Lower Bounds on the Depth of Integral ReLU Neural Networks via Lattice
Polytopes [3.0079490585515343]
We show that $lceillog_(n)rceil$ hidden layers are indeed necessary to compute the maximum of $n$ numbers.
Our results are based on the known duality between neural networks and Newton polytopes via tropical geometry.
arXiv Detail & Related papers (2023-02-24T10:14:53Z) - Understanding Deep Neural Function Approximation in Reinforcement
Learning via $\epsilon$-Greedy Exploration [53.90873926758026]
This paper provides a theoretical study of deep neural function approximation in reinforcement learning (RL)
We focus on the value based algorithm with the $epsilon$-greedy exploration via deep (and two-layer) neural networks endowed by Besov (and Barron) function spaces.
Our analysis reformulates the temporal difference error in an $L2(mathrmdmu)$-integrable space over a certain averaged measure $mu$, and transforms it to a generalization problem under the non-iid setting.
arXiv Detail & Related papers (2022-09-15T15:42:47Z) - Size and Depth Separation in Approximating Natural Functions with Neural
Networks [52.73592689730044]
We show the benefits of size and depth for approximation of natural functions with ReLU networks.
We show a complexity-theoretic barrier to proving such results beyond size $O(d)$.
We also show an explicit natural function, that can be approximated with networks of size $O(d)$.
arXiv Detail & Related papers (2021-01-30T21:30:11Z) - Tight Bounds on the Smallest Eigenvalue of the Neural Tangent Kernel for
Deep ReLU Networks [21.13299067136635]
We provide tight bounds on the smallest eigenvalue of NTK matrices for deep ReLU networks.
In the finite-width setting, the network architectures we consider are quite general.
arXiv Detail & Related papers (2020-12-21T19:32:17Z) - Provable Memorization via Deep Neural Networks using Sub-linear
Parameters [91.0268925267129]
It is known that $O(N)$ parameters are sufficient for neural networks to memorize arbitrary $N$ input-label pairs.
By exploiting depth, we show that $O(N2/3)$ parameters suffice to memorize $N$ pairs, under a mild condition on the separation of input points.
arXiv Detail & Related papers (2020-10-26T06:19:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.