Network size and weights size for memorization with two-layers neural
networks
- URL: http://arxiv.org/abs/2006.02855v2
- Date: Tue, 3 Nov 2020 07:15:50 GMT
- Title: Network size and weights size for memorization with two-layers neural
networks
- Authors: S\'ebastien Bubeck and Ronen Eldan and Yin Tat Lee and Dan Mikulincer
- Abstract summary: We propose a new training procedure for ReLU networks, based on complex (as opposed to real) recombination of the neurons.
We show approximate memorization with both $Oleft(fracnd cdot fraclog(1/epsilon)epsilonright)$ neurons, as well as nearly-optimal size of the weights.
- Score: 15.333300054767726
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In 1988, Eric B. Baum showed that two-layers neural networks with threshold
activation function can perfectly memorize the binary labels of $n$ points in
general position in $\mathbb{R}^d$ using only $\ulcorner n/d \urcorner$
neurons. We observe that with ReLU networks, using four times as many neurons
one can fit arbitrary real labels. Moreover, for approximate memorization up to
error $\epsilon$, the neural tangent kernel can also memorize with only
$O\left(\frac{n}{d} \cdot \log(1/\epsilon) \right)$ neurons (assuming that the
data is well dispersed too). We show however that these constructions give rise
to networks where the magnitude of the neurons' weights are far from optimal.
In contrast we propose a new training procedure for ReLU networks, based on
complex (as opposed to real) recombination of the neurons, for which we show
approximate memorization with both $O\left(\frac{n}{d} \cdot
\frac{\log(1/\epsilon)}{\epsilon}\right)$ neurons, as well as nearly-optimal
size of the weights.
Related papers
- Memorization Capacity for Additive Fine-Tuning with Small ReLU Networks [16.320374162259117]
Fine-Tuning Capacity (FTC) is defined as the maximum number of samples a neural network can fine-tune.
We show that $N$ samples can be fine-tuned with $m=Theta(N)$ neurons for 2-layer networks, and with $m=Theta(sqrtN)$ neurons for 3-layer networks, no matter how large $K$ is.
arXiv Detail & Related papers (2024-08-01T07:58:51Z) - Rates of Approximation by ReLU Shallow Neural Networks [8.22379888383833]
We show that ReLU shallow neural networks with $m$ hidden neurons can uniformly approximate functions from the H"older space.
Such rates are very close to the optimal one $O(m-fracrd)$ in the sense that $fracd+2d+4d+4$ is close to $1$, when the dimension $d$ is large.
arXiv Detail & Related papers (2023-07-24T00:16:50Z) - Generalization Ability of Wide Neural Networks on $\mathbb{R}$ [8.508360765158326]
We study the generalization ability of the wide two-layer ReLU neural network on $mathbbR$.
We show that: $i)$ when the width $mrightarrowinfty$, the neural network kernel (NNK) uniformly converges to the NTK; $ii)$ the minimax rate of regression over the RKHS associated to $K_1$ is $n-2/3$; $iii)$ if one adopts the early stopping strategy in training a wide neural network, the resulting neural network achieves the minimax rate; $iv
arXiv Detail & Related papers (2023-02-12T15:07:27Z) - The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich
Regimes [75.59720049837459]
We study the transition from infinite-width behavior to this variance limited regime as a function of sample size $P$ and network width $N$.
We find that finite-size effects can become relevant for very small datasets on the order of $P* sim sqrtN$ for regression with ReLU networks.
arXiv Detail & Related papers (2022-12-23T04:48:04Z) - On the Optimal Memorization Power of ReLU Neural Networks [53.15475693468925]
We show that feedforward ReLU neural networks can memorization any $N$ points that satisfy a mild separability assumption.
We prove that having such a large bit complexity is both necessary and sufficient for memorization with a sub-linear number of parameters.
arXiv Detail & Related papers (2021-10-07T05:25:23Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - An Exponential Improvement on the Memorization Capacity of Deep
Threshold Networks [40.489350374378645]
We prove that $widetildemathcalO(e1/delta2+sqrtn)$ neurons and $widetildemathcalO(fracddelta+n)$ weights are sufficient.
We also prove new lower bounds by connecting in neural networks to the purely geometric problem of separating $n$ points on a sphere using hyperplanes.
arXiv Detail & Related papers (2021-06-14T19:42:32Z) - Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK [58.5766737343951]
We consider the dynamic of descent for learning a two-layer neural network.
We show that an over-parametrized two-layer neural network can provably learn with gradient loss at most ground with Tangent samples.
arXiv Detail & Related papers (2020-07-09T07:09:28Z) - Towards Understanding Hierarchical Learning: Benefits of Neural
Representations [160.33479656108926]
In this work, we demonstrate that intermediate neural representations add more flexibility to neural networks.
We show that neural representation can achieve improved sample complexities compared with the raw input.
Our results characterize when neural representations are beneficial, and may provide a new perspective on why depth is important in deep learning.
arXiv Detail & Related papers (2020-06-24T02:44:54Z) - Non-linear Neurons with Human-like Apical Dendrite Activations [81.18416067005538]
We show that a standard neuron followed by our novel apical dendrite activation (ADA) can learn the XOR logical function with 100% accuracy.
We conduct experiments on six benchmark data sets from computer vision, signal processing and natural language processing.
arXiv Detail & Related papers (2020-02-02T21:09:39Z) - A Corrective View of Neural Networks: Representation, Memorization and
Learning [26.87238691716307]
We develop a corrective mechanism for neural network approximation.
We show that two-layer neural networks in the random features regime (RF) can memorize arbitrary labels.
We also consider three-layer neural networks and show that the corrective mechanism yields faster representation rates for smooth radial functions.
arXiv Detail & Related papers (2020-02-01T20:51:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.