Related papers: Provable Memorization via Deep Neural Networks using Sub-linear Parameters

Provable Memorization via Deep Neural Networks using Sub-linear Parameters

URL: http://arxiv.org/abs/2010.13363v2
Date: Tue, 2 Nov 2021 15:33:07 GMT
Title: Provable Memorization via Deep Neural Networks using Sub-linear Parameters
Authors: Sejun Park, Jaeho Lee, Chulhee Yun, Jinwoo Shin
Abstract summary: It is known that $O(N)$ parameters are sufficient for neural networks to memorize arbitrary $N$ input-label pairs. By exploiting depth, we show that $O(N2/3)$ parameters suffice to memorize $N$ pairs, under a mild condition on the separation of input points.
Score: 91.0268925267129
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: It is known that $O(N)$ parameters are sufficient for neural networks to memorize arbitrary $N$ input-label pairs. By exploiting depth, we show that $O(N^{2/3})$ parameters suffice to memorize $N$ pairs, under a mild condition on the separation of input points. In particular, deeper networks (even with width $3$) are shown to memorize more pairs than shallow networks, which also agrees with the recent line of works on the benefits of depth for function approximation. We also provide empirical results that support our theoretical findings.

Related papers

Bayesian Inference with Deep Weakly Nonlinear Networks [57.95116787699412]
We show at a physics level of rigor that Bayesian inference with a fully connected neural network is solvable. We provide techniques to compute the model evidence and posterior to arbitrary order in $1/N$ and at arbitrary temperature.
arXiv Detail & Related papers (2024-05-26T17:08:04Z)
Optimal Approximation and Learning Rates for Deep Convolutional Neural Networks [17.075804626858748]
This paper focuses on approximation and learning performance analysis for deep convolutional neural networks with zero-padding and max-pooling. We prove that, to approximate $r$-smooth function, the approximation rates of deep convolutional neural networks with depth $L$ are of order $ (L2/log L)-2r/d $, which is optimal up to a logarithmic factor.
arXiv Detail & Related papers (2023-08-07T02:37:02Z)
Rates of Approximation by ReLU Shallow Neural Networks [8.22379888383833]
We show that ReLU shallow neural networks with $m$ hidden neurons can uniformly approximate functions from the H"older space. Such rates are very close to the optimal one $O(m-fracrd)$ in the sense that $fracd+2d+4d+4$ is close to $1$, when the dimension $d$ is large.
arXiv Detail & Related papers (2023-07-24T00:16:50Z)
Sharp Lower Bounds on Interpolation by Deep ReLU Neural Networks at Irregularly Spaced Data [2.7195102129095003]
Deep ReLU neural networks can interpolate values at $N$ datapoints which are separated by a distance $delta$. We show that $Omega(N)$ parameters are required in the regime where $delta$ is exponentially small in $N$. As an application we give a lower bound on the approximation rates that deep ReLU neural networks can achieve for Sobolev spaces at the embedding endpoint.
arXiv Detail & Related papers (2023-02-02T02:46:20Z)
Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization [14.186776881154127]
The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provide memorization, optimization and generalization guarantees in deep neural networks. We show that the NTK is well conditioned in a challenging sub-linear setup. Our key technical contribution is a lower bound on the smallest NTK eigenvalue for deep networks.
arXiv Detail & Related papers (2022-05-20T14:50:24Z)
On the Optimal Memorization Power of ReLU Neural Networks [53.15475693468925]
We show that feedforward ReLU neural networks can memorization any $N$ points that satisfy a mild separability assumption. We prove that having such a large bit complexity is both necessary and sufficient for memorization with a sub-linear number of parameters.
arXiv Detail & Related papers (2021-10-07T05:25:23Z)
Exploring the Common Principal Subspace of Deep Features in Neural Networks [50.37178960258464]
We find that different Deep Neural Networks (DNNs) trained with the same dataset share a common principal subspace in latent spaces. Specifically, we design a new metric $mathcalP$-vector to represent the principal subspace of deep features learned in a DNN. Small angles (with cosine close to $1.0$) have been found in the comparisons between any two DNNs trained with different algorithms/architectures.
arXiv Detail & Related papers (2021-10-06T15:48:32Z)
Function approximation by deep neural networks with parameters $\{0,\pm \frac{1}{2}, \pm 1, 2\}$ [91.3755431537592]
It is shown that $C_beta$-smooth functions can be approximated by neural networks with parameters $0,pm frac12, pm 1, 2$. The depth, width and the number of active parameters of constructed networks have, up to a logarithimc factor, the same dependence on the approximation error as the networks with parameters in $[-1,1]$.
arXiv Detail & Related papers (2021-03-15T19:10:02Z)
Towards Understanding Hierarchical Learning: Benefits of Neural Representations [160.33479656108926]
In this work, we demonstrate that intermediate neural representations add more flexibility to neural networks. We show that neural representation can achieve improved sample complexities compared with the raw input. Our results characterize when neural representations are beneficial, and may provide a new perspective on why depth is important in deep learning.
arXiv Detail & Related papers (2020-06-24T02:44:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.