Provable Tempered Overfitting of Minimal Nets and Typical Nets
- URL: http://arxiv.org/abs/2410.19092v1
- Date: Thu, 24 Oct 2024 18:51:56 GMT
- Title: Provable Tempered Overfitting of Minimal Nets and Typical Nets
- Authors: Itamar Harel, William M. Hoza, Gal Vardi, Itay Evron, Nathan Srebro, Daniel Soudry,
- Abstract summary: We study the overfitting behavior of fully connected deep Neural Networks (NNs)
We consider using both the smallest NN (having the minimal number of weights) and a random interpolating NN.
For both learning rules, we prove overfitting is tempered.
- Score: 42.995653381420595
- License:
- Abstract: We study the overfitting behavior of fully connected deep Neural Networks (NNs) with binary weights fitted to perfectly classify a noisy training set. We consider interpolation using both the smallest NN (having the minimal number of weights) and a random interpolating NN. For both learning rules, we prove overfitting is tempered. Our analysis rests on a new bound on the size of a threshold circuit consistent with a partial function. To the best of our knowledge, ours are the first theoretical results on benign or tempered overfitting that: (1) apply to deep NNs, and (2) do not require a very high or very low input dimension.
Related papers
- Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - Deep Learning meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive? [16.105097124039602]
We study the theory of neural network (NN) from the lens of classical nonparametric regression problems.
Our research sheds new lights on why depth matters and how NNs are more powerful than kernel methods.
arXiv Detail & Related papers (2022-04-20T17:55:16Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - Geometric Path Enumeration for Equivalence Verification of Neural
Networks [2.007262412327553]
We focus on the formal verification problem of NN equivalence which aims to prove that two NNs show equivalent behavior.
We show a theoretical result by proving that the epsilon-equivalence problem is coNP-complete.
In a third step, we implement the extended algorithm for equivalence verification and evaluate optimizations necessary for its practical use.
arXiv Detail & Related papers (2021-12-13T11:56:08Z) - On the Equivalence between Neural Network and Support Vector Machine [23.174679357972984]
The dynamics of an infinitely wide neural network (NN) trained by gradient descent can be characterized by Tangent Neural Kernel (NTK)
We establish the equivalence between NN and support vector machine (SVM)
Our main theoretical results include establishing the equivalence between NN and a broad family of $ell$ regularized KMs with finite-width bounds.
arXiv Detail & Related papers (2021-11-11T06:05:00Z) - Neural Optimization Kernel: Towards Robust Deep Learning [13.147925376013129]
Recent studies show a connection between neural networks (NN) and kernel methods.
This paper proposes a novel kernel family named Kernel (NOK)
We show that over parameterized deep NN (NOK) can increase the expressive power to reduce empirical risk and reduce the bound generalization at the same time.
arXiv Detail & Related papers (2021-06-11T00:34:55Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Distance-Based Regularisation of Deep Networks for Fine-Tuning [116.71288796019809]
We develop an algorithm that constrains a hypothesis class to a small sphere centred on the initial pre-trained weights.
Empirical evaluation shows that our algorithm works well, corroborating our theoretical results.
arXiv Detail & Related papers (2020-02-19T16:00:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.