On the Optimal Memorization Power of ReLU Neural Networks
- URL: http://arxiv.org/abs/2110.03187v1
- Date: Thu, 7 Oct 2021 05:25:23 GMT
- Title: On the Optimal Memorization Power of ReLU Neural Networks
- Authors: Gal Vardi, Gilad Yehudai, Ohad Shamir
- Abstract summary: We show that feedforward ReLU neural networks can memorization any $N$ points that satisfy a mild separability assumption.
We prove that having such a large bit complexity is both necessary and sufficient for memorization with a sub-linear number of parameters.
- Score: 53.15475693468925
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the memorization power of feedforward ReLU neural networks. We show
that such networks can memorize any $N$ points that satisfy a mild separability
assumption using $\tilde{O}\left(\sqrt{N}\right)$ parameters. Known
VC-dimension upper bounds imply that memorizing $N$ samples requires
$\Omega(\sqrt{N})$ parameters, and hence our construction is optimal up to
logarithmic factors. We also give a generalized construction for networks with
depth bounded by $1 \leq L \leq \sqrt{N}$, for memorizing $N$ samples using
$\tilde{O}(N/L)$ parameters. This bound is also optimal up to logarithmic
factors. Our construction uses weights with large bit complexity. We prove that
having such a large bit complexity is both necessary and sufficient for
memorization with a sub-linear number of parameters.
Related papers
- Deep Neural Networks: Multi-Classification and Universal Approximation [0.0]
We demonstrate that a ReLU deep neural network with a width of $2$ and a depth of $2N+4M-1$ layers can achieve finite sample memorization for any dataset comprising $N$ elements.
We also provide depth estimates for approximating $W1,p$ functions and width estimates for approximating $Lp(Omega;mathbbRm)$ for $mgeq1$.
arXiv Detail & Related papers (2024-09-10T14:31:21Z) - On the Complexity of Neural Computation in Superposition [3.9803704378699103]
Recent advances in the understanding of neural networks suggest that superposition is a key mechanism underlying the computational efficiency of large-scale networks.
We show a nearly tight upper bound: logical operations like pairwise AND can be computed using $O(sqrtm' log m')$ neurons and $O(m' log2 m')$ parameters.
Our results open a path for using complexity theoretic techniques in neural network interpretability research.
arXiv Detail & Related papers (2024-09-05T18:58:59Z) - Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
We study the problem of learning hierarchical functions over the standard Gaussian distribution with three-layer neural networks.
For a large subclass of degree $k$s $p$, a three-layer neural network trained via layerwise gradientp descent on the square loss learns the target $h$ up to vanishing test error.
This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
arXiv Detail & Related papers (2023-11-23T02:19:32Z) - Spacetime-Efficient Low-Depth Quantum State Preparation with
Applications [93.56766264306764]
We show that a novel deterministic method for preparing arbitrary quantum states requires fewer quantum resources than previous methods.
We highlight several applications where this ability would be useful, including quantum machine learning, Hamiltonian simulation, and solving linear systems of equations.
arXiv Detail & Related papers (2023-03-03T18:23:20Z) - Quantum Resources Required to Block-Encode a Matrix of Classical Data [56.508135743727934]
We provide circuit-level implementations and resource estimates for several methods of block-encoding a dense $Ntimes N$ matrix of classical data to precision $epsilon$.
We examine resource tradeoffs between the different approaches and explore implementations of two separate models of quantum random access memory (QRAM)
Our results go beyond simple query complexity and provide a clear picture into the resource costs when large amounts of classical data are assumed to be accessible to quantum algorithms.
arXiv Detail & Related papers (2022-06-07T18:00:01Z) - Memorization and Optimization in Deep Neural Networks with Minimum
Over-parameterization [14.186776881154127]
The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provide memorization, optimization and generalization guarantees in deep neural networks.
We show that the NTK is well conditioned in a challenging sub-linear setup.
Our key technical contribution is a lower bound on the smallest NTK eigenvalue for deep networks.
arXiv Detail & Related papers (2022-05-20T14:50:24Z) - An Exponential Improvement on the Memorization Capacity of Deep
Threshold Networks [40.489350374378645]
We prove that $widetildemathcalO(e1/delta2+sqrtn)$ neurons and $widetildemathcalO(fracddelta+n)$ weights are sufficient.
We also prove new lower bounds by connecting in neural networks to the purely geometric problem of separating $n$ points on a sphere using hyperplanes.
arXiv Detail & Related papers (2021-06-14T19:42:32Z) - Provable Memorization via Deep Neural Networks using Sub-linear
Parameters [91.0268925267129]
It is known that $O(N)$ parameters are sufficient for neural networks to memorize arbitrary $N$ input-label pairs.
By exploiting depth, we show that $O(N2/3)$ parameters suffice to memorize $N$ pairs, under a mild condition on the separation of input points.
arXiv Detail & Related papers (2020-10-26T06:19:38Z) - On the Modularity of Hypernetworks [103.1147622394852]
We show that for a structured target function, the overall number of trainable parameters in a hypernetwork is smaller by orders of magnitude than the number of trainable parameters of a standard neural network and an embedding method.
arXiv Detail & Related papers (2020-02-23T22:51:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.