An Exponential Improvement on the Memorization Capacity of Deep
Threshold Networks
- URL: http://arxiv.org/abs/2106.07724v1
- Date: Mon, 14 Jun 2021 19:42:32 GMT
- Title: An Exponential Improvement on the Memorization Capacity of Deep
Threshold Networks
- Authors: Shashank Rajput, Kartik Sreenivasan, Dimitris Papailiopoulos, Amin
Karbasi
- Abstract summary: We prove that $widetildemathcalO(e1/delta2+sqrtn)$ neurons and $widetildemathcalO(fracddelta+n)$ weights are sufficient.
We also prove new lower bounds by connecting in neural networks to the purely geometric problem of separating $n$ points on a sphere using hyperplanes.
- Score: 40.489350374378645
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is well known that modern deep neural networks are powerful enough to
memorize datasets even when the labels have been randomized. Recently,
Vershynin (2020) settled a long standing question by Baum (1988), proving that
\emph{deep threshold} networks can memorize $n$ points in $d$ dimensions using
$\widetilde{\mathcal{O}}(e^{1/\delta^2}+\sqrt{n})$ neurons and
$\widetilde{\mathcal{O}}(e^{1/\delta^2}(d+\sqrt{n})+n)$ weights, where $\delta$
is the minimum distance between the points. In this work, we improve the
dependence on $\delta$ from exponential to almost linear, proving that
$\widetilde{\mathcal{O}}(\frac{1}{\delta}+\sqrt{n})$ neurons and
$\widetilde{\mathcal{O}}(\frac{d}{\delta}+n)$ weights are sufficient. Our
construction uses Gaussian random weights only in the first layer, while all
the subsequent layers use binary or integer weights. We also prove new lower
bounds by connecting memorization in neural networks to the purely geometric
problem of separating $n$ points on a sphere using hyperplanes.
Related papers
- Deep Neural Networks: Multi-Classification and Universal Approximation [0.0]
We demonstrate that a ReLU deep neural network with a width of $2$ and a depth of $2N+4M-1$ layers can achieve finite sample memorization for any dataset comprising $N$ elements.
We also provide depth estimates for approximating $W1,p$ functions and width estimates for approximating $Lp(Omega;mathbbRm)$ for $mgeq1$.
arXiv Detail & Related papers (2024-09-10T14:31:21Z) - Bayesian Inference with Deep Weakly Nonlinear Networks [57.95116787699412]
We show at a physics level of rigor that Bayesian inference with a fully connected neural network is solvable.
We provide techniques to compute the model evidence and posterior to arbitrary order in $1/N$ and at arbitrary temperature.
arXiv Detail & Related papers (2024-05-26T17:08:04Z) - Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
We study the problem of learning hierarchical functions over the standard Gaussian distribution with three-layer neural networks.
For a large subclass of degree $k$s $p$, a three-layer neural network trained via layerwise gradientp descent on the square loss learns the target $h$ up to vanishing test error.
This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
arXiv Detail & Related papers (2023-11-23T02:19:32Z) - Rates of Approximation by ReLU Shallow Neural Networks [8.22379888383833]
We show that ReLU shallow neural networks with $m$ hidden neurons can uniformly approximate functions from the H"older space.
Such rates are very close to the optimal one $O(m-fracrd)$ in the sense that $fracd+2d+4d+4$ is close to $1$, when the dimension $d$ is large.
arXiv Detail & Related papers (2023-07-24T00:16:50Z) - Neural Networks Efficiently Learn Low-Dimensional Representations with
SGD [22.703825902761405]
We show that SGD-trained ReLU NNs can learn a single-index target of the form $y=f(langleboldsymbolu,boldsymbolxrangle) + epsilon$ by recovering the principal direction.
We also provide compress guarantees for NNs using the approximate low-rank structure produced by SGD.
arXiv Detail & Related papers (2022-09-29T15:29:10Z) - Learning (Very) Simple Generative Models Is Hard [45.13248517769758]
We show that no-time algorithm can solve problem even when output coordinates of $mathbbRdtobbRd'$ are one-hidden-layer ReLU networks with $mathrmpoly(d)$ neurons.
Key ingredient in our proof is an ODE-based construction of a compactly supported, piecewise-linear function $f$ with neurally-bounded slopes such that the pushforward of $mathcalN(0,1)$ under $f$ matches all low-degree moments of $mathcal
arXiv Detail & Related papers (2022-05-31T17:59:09Z) - On the Optimal Memorization Power of ReLU Neural Networks [53.15475693468925]
We show that feedforward ReLU neural networks can memorization any $N$ points that satisfy a mild separability assumption.
We prove that having such a large bit complexity is both necessary and sufficient for memorization with a sub-linear number of parameters.
arXiv Detail & Related papers (2021-10-07T05:25:23Z) - Distributed Saddle-Point Problems Under Similarity [173.19083235638104]
We show that a given suboptimality $epsilon0$ is achieved master/workers networks in $Omegabig.
We then propose algorithms matching the lower bounds either types of networks (up to log-overs)
We assess effectiveness of the proposed algorithms on a robust logistic regression problem.
arXiv Detail & Related papers (2021-07-22T14:25:16Z) - Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK [58.5766737343951]
We consider the dynamic of descent for learning a two-layer neural network.
We show that an over-parametrized two-layer neural network can provably learn with gradient loss at most ground with Tangent samples.
arXiv Detail & Related papers (2020-07-09T07:09:28Z) - Network size and weights size for memorization with two-layers neural
networks [15.333300054767726]
We propose a new training procedure for ReLU networks, based on complex (as opposed to real) recombination of the neurons.
We show approximate memorization with both $Oleft(fracnd cdot fraclog(1/epsilon)epsilonright)$ neurons, as well as nearly-optimal size of the weights.
arXiv Detail & Related papers (2020-06-04T13:44:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.