Related papers: How Many Neurons Does it Take to Approximate the Maximum?

How Many Neurons Does it Take to Approximate the Maximum?

URL: http://arxiv.org/abs/2307.09212v2
Date: Tue, 7 Nov 2023 17:50:27 GMT
Title: How Many Neurons Does it Take to Approximate the Maximum?
Authors: Itay Safran, Daniel Reichman, Paul Valiant
Abstract summary: We study the size of a neural network needed to approximate the maximum function over $d$ inputs. We provide new lower and upper bounds on the width required for approximation across various depths.
Score: 10.995895410470279
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the size of a neural network needed to approximate the maximum function over $d$ inputs, in the most basic setting of approximating with respect to the $L_2$ norm, for continuous distributions, for a network that uses ReLU activations. We provide new lower and upper bounds on the width required for approximation across various depths. Our results establish new depth separations between depth 2 and 3, and depth 3 and 5 networks, as well as providing a depth $\mathcal{O}(\log(\log(d)))$ and width $\mathcal{O}(d)$ construction which approximates the maximum function. Our depth separation results are facilitated by a new lower bound for depth 2 networks approximating the maximum function over the uniform distribution, assuming an exponential upper bound on the size of the weights. Furthermore, we are able to use this depth 2 lower bound to provide tight bounds on the number of neurons needed to approximate the maximum by a depth 3 network. Our lower bounds are of potentially broad interest as they apply to the widely studied and used \emph{max} function, in contrast to many previous results that base their bounds on specially constructed or pathological functions and distributions.

Related papers

Depth-Bounds for Neural Networks via the Braid Arrangement [5.127394801557798]
We prove a non-constant lower bound of $Omega(loglog d)$ hidden layers required to represent the maximum of $d$ numbers. We show that a rank-3 maxout layer followed by a rank-2 maxout layer is sufficient to represent the maximum of 7 numbers.
arXiv Detail & Related papers (2025-02-13T13:37:52Z)
Bayesian Inference with Deep Weakly Nonlinear Networks [57.95116787699412]
We show at a physics level of rigor that Bayesian inference with a fully connected neural network is solvable. We provide techniques to compute the model evidence and posterior to arbitrary order in $1/N$ and at arbitrary temperature.
arXiv Detail & Related papers (2024-05-26T17:08:04Z)
Depth Separations in Neural Networks: Separating the Dimension from the Accuracy [9.783697404304027]
We prove an exponential size separation between depth 2 and depth 3 neural networks (with real inputs) We show that the curse of dimensionality manifests itself in depth 2 approximation, even in cases where the target function can be represented efficiently using a depth 3 network.
arXiv Detail & Related papers (2024-02-11T17:27:26Z)
Neural networks: deep, shallow, or in between? [0.6043356028687779]
We give estimates for the error of approximation of a compact subset from a Banach space by the outputs of feed-forward neural networks with width W, depth l and Lipschitz activation functions. We show that, modulo logarithmic factors, rates better that entropy numbers' rates are possibly attainable only for neural networks for which the depth l goes to infinity.
arXiv Detail & Related papers (2023-10-11T04:50:28Z)
Polynomial Width is Sufficient for Set Representation with High-dimensional Features [69.65698500919869]
DeepSets is the most widely used neural network architecture for set representation. We present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE)
arXiv Detail & Related papers (2023-07-08T16:00:59Z)
Data Topology-Dependent Upper Bounds of Neural Network Widths [52.58441144171022]
We first show that a three-layer neural network can be designed to approximate an indicator function over a compact set. This is then extended to a simplicial complex, deriving width upper bounds based on its topological structure. We prove the universal approximation property of three-layer ReLU networks using our topological approach.
arXiv Detail & Related papers (2023-05-25T14:17:15Z)
Lower Bounds on the Depth of Integral ReLU Neural Networks via Lattice Polytopes [3.0079490585515343]
We show that $lceillog_(n)rceil$ hidden layers are indeed necessary to compute the maximum of $n$ numbers. Our results are based on the known duality between neural networks and Newton polytopes via tropical geometry.
arXiv Detail & Related papers (2023-02-24T10:14:53Z)
Understanding Deep Neural Function Approximation in Reinforcement Learning via $\epsilon$-Greedy Exploration [53.90873926758026]
This paper provides a theoretical study of deep neural function approximation in reinforcement learning (RL) We focus on the value based algorithm with the $epsilon$-greedy exploration via deep (and two-layer) neural networks endowed by Besov (and Barron) function spaces. Our analysis reformulates the temporal difference error in an $L2(mathrmdmu)$-integrable space over a certain averaged measure $mu$, and transforms it to a generalization problem under the non-iid setting.
arXiv Detail & Related papers (2022-09-15T15:42:47Z)
Size and Depth Separation in Approximating Natural Functions with Neural Networks [52.73592689730044]
We show the benefits of size and depth for approximation of natural functions with ReLU networks. We show a complexity-theoretic barrier to proving such results beyond size $O(d)$. We also show an explicit natural function, that can be approximated with networks of size $O(d)$.
arXiv Detail & Related papers (2021-01-30T21:30:11Z)
Tight Bounds on the Smallest Eigenvalue of the Neural Tangent Kernel for Deep ReLU Networks [21.13299067136635]
We provide tight bounds on the smallest eigenvalue of NTK matrices for deep ReLU networks. In the finite-width setting, the network architectures we consider are quite general.
arXiv Detail & Related papers (2020-12-21T19:32:17Z)
Provable Memorization via Deep Neural Networks using Sub-linear Parameters [91.0268925267129]
It is known that $O(N)$ parameters are sufficient for neural networks to memorize arbitrary $N$ input-label pairs. By exploiting depth, we show that $O(N2/3)$ parameters suffice to memorize $N$ pairs, under a mild condition on the separation of input points.
arXiv Detail & Related papers (2020-10-26T06:19:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.