Provable Memorization via Deep Neural Networks using Sub-linear
Parameters
- URL: http://arxiv.org/abs/2010.13363v2
- Date: Tue, 2 Nov 2021 15:33:07 GMT
- Title: Provable Memorization via Deep Neural Networks using Sub-linear
Parameters
- Authors: Sejun Park, Jaeho Lee, Chulhee Yun, Jinwoo Shin
- Abstract summary: It is known that $O(N)$ parameters are sufficient for neural networks to memorize arbitrary $N$ input-label pairs.
By exploiting depth, we show that $O(N2/3)$ parameters suffice to memorize $N$ pairs, under a mild condition on the separation of input points.
- Score: 91.0268925267129
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is known that $O(N)$ parameters are sufficient for neural networks to
memorize arbitrary $N$ input-label pairs. By exploiting depth, we show that
$O(N^{2/3})$ parameters suffice to memorize $N$ pairs, under a mild condition
on the separation of input points. In particular, deeper networks (even with
width $3$) are shown to memorize more pairs than shallow networks, which also
agrees with the recent line of works on the benefits of depth for function
approximation. We also provide empirical results that support our theoretical
findings.
Related papers
- Bayesian Inference with Deep Weakly Nonlinear Networks [57.95116787699412]
We show at a physics level of rigor that Bayesian inference with a fully connected neural network is solvable.
We provide techniques to compute the model evidence and posterior to arbitrary order in $1/N$ and at arbitrary temperature.
arXiv Detail & Related papers (2024-05-26T17:08:04Z) - Optimal Approximation and Learning Rates for Deep Convolutional Neural
Networks [17.075804626858748]
This paper focuses on approximation and learning performance analysis for deep convolutional neural networks with zero-padding and max-pooling.
We prove that, to approximate $r$-smooth function, the approximation rates of deep convolutional neural networks with depth $L$ are of order $ (L2/log L)-2r/d $, which is optimal up to a logarithmic factor.
arXiv Detail & Related papers (2023-08-07T02:37:02Z) - Rates of Approximation by ReLU Shallow Neural Networks [8.22379888383833]
We show that ReLU shallow neural networks with $m$ hidden neurons can uniformly approximate functions from the H"older space.
Such rates are very close to the optimal one $O(m-fracrd)$ in the sense that $fracd+2d+4d+4$ is close to $1$, when the dimension $d$ is large.
arXiv Detail & Related papers (2023-07-24T00:16:50Z) - Sharp Lower Bounds on Interpolation by Deep ReLU Neural Networks at
Irregularly Spaced Data [2.7195102129095003]
Deep ReLU neural networks can interpolate values at $N$ datapoints which are separated by a distance $delta$.
We show that $Omega(N)$ parameters are required in the regime where $delta$ is exponentially small in $N$.
As an application we give a lower bound on the approximation rates that deep ReLU neural networks can achieve for Sobolev spaces at the embedding endpoint.
arXiv Detail & Related papers (2023-02-02T02:46:20Z) - Memorization and Optimization in Deep Neural Networks with Minimum
Over-parameterization [14.186776881154127]
The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provide memorization, optimization and generalization guarantees in deep neural networks.
We show that the NTK is well conditioned in a challenging sub-linear setup.
Our key technical contribution is a lower bound on the smallest NTK eigenvalue for deep networks.
arXiv Detail & Related papers (2022-05-20T14:50:24Z) - On the Optimal Memorization Power of ReLU Neural Networks [53.15475693468925]
We show that feedforward ReLU neural networks can memorization any $N$ points that satisfy a mild separability assumption.
We prove that having such a large bit complexity is both necessary and sufficient for memorization with a sub-linear number of parameters.
arXiv Detail & Related papers (2021-10-07T05:25:23Z) - Exploring the Common Principal Subspace of Deep Features in Neural
Networks [50.37178960258464]
We find that different Deep Neural Networks (DNNs) trained with the same dataset share a common principal subspace in latent spaces.
Specifically, we design a new metric $mathcalP$-vector to represent the principal subspace of deep features learned in a DNN.
Small angles (with cosine close to $1.0$) have been found in the comparisons between any two DNNs trained with different algorithms/architectures.
arXiv Detail & Related papers (2021-10-06T15:48:32Z) - Function approximation by deep neural networks with parameters $\{0,\pm
\frac{1}{2}, \pm 1, 2\}$ [91.3755431537592]
It is shown that $C_beta$-smooth functions can be approximated by neural networks with parameters $0,pm frac12, pm 1, 2$.
The depth, width and the number of active parameters of constructed networks have, up to a logarithimc factor, the same dependence on the approximation error as the networks with parameters in $[-1,1]$.
arXiv Detail & Related papers (2021-03-15T19:10:02Z) - Towards Understanding Hierarchical Learning: Benefits of Neural
Representations [160.33479656108926]
In this work, we demonstrate that intermediate neural representations add more flexibility to neural networks.
We show that neural representation can achieve improved sample complexities compared with the raw input.
Our results characterize when neural representations are beneficial, and may provide a new perspective on why depth is important in deep learning.
arXiv Detail & Related papers (2020-06-24T02:44:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.