Benign Overfitting in Two-layer Convolutional Neural Networks
- URL: http://arxiv.org/abs/2202.06526v1
- Date: Mon, 14 Feb 2022 07:45:51 GMT
- Title: Benign Overfitting in Two-layer Convolutional Neural Networks
- Authors: Yuan Cao and Zixiang Chen and Mikhail Belkin and Quanquan Gu
- Abstract summary: We study the benign overfitting phenomenon in training a two-layer convolutional neural network (CNN)
We show that when the signal-to-noise ratio satisfies a certain condition, a two-layer CNN trained by gradient descent can achieve arbitrarily small training and test loss.
On the other hand, when this condition does not hold, overfitting becomes harmful and the obtained CNN can only achieve constant level test loss.
- Score: 90.75603889605043
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern neural networks often have great expressive power and can be trained
to overfit the training data, while still achieving a good test performance.
This phenomenon is referred to as "benign overfitting". Recently, there emerges
a line of works studying "benign overfitting" from the theoretical perspective.
However, they are limited to linear models or kernel/random feature models, and
there is still a lack of theoretical understanding about when and how benign
overfitting occurs in neural networks. In this paper, we study the benign
overfitting phenomenon in training a two-layer convolutional neural network
(CNN). We show that when the signal-to-noise ratio satisfies a certain
condition, a two-layer CNN trained by gradient descent can achieve arbitrarily
small training and test loss. On the other hand, when this condition does not
hold, overfitting becomes harmful and the obtained CNN can only achieve
constant level test loss. These together demonstrate a sharp phase transition
between benign overfitting and harmful overfitting, driven by the
signal-to-noise ratio. To the best of our knowledge, this is the first work
that precisely characterizes the conditions under which benign overfitting can
occur in training convolutional neural networks.
Related papers
- Benign Overfitting in Two-Layer ReLU Convolutional Neural Networks for
XOR Data [24.86314525762012]
We show that ReLU CNN trained by gradient descent can achieve near Bayes-optimal accuracy.
Our result demonstrates that CNNs have a remarkable capacity to efficiently learn XOR problems, even in the presence of highly correlated features.
arXiv Detail & Related papers (2023-10-03T11:31:37Z) - Benign Overfitting for Two-layer ReLU Convolutional Neural Networks [60.19739010031304]
We establish algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise.
We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk.
arXiv Detail & Related papers (2023-03-07T18:59:38Z) - Theoretical Characterization of How Neural Network Pruning Affects its
Generalization [131.1347309639727]
This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization.
It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero.
More surprisingly, the generalization bound gets better as the pruning fraction gets larger.
arXiv Detail & Related papers (2023-01-01T03:10:45Z) - Benign, Tempered, or Catastrophic: A Taxonomy of Overfitting [19.08269066145619]
Some interpolating methods, including neural networks, can fit noisy training data without catastrophically bad test performance.
We argue that real interpolating methods like neural networks do not fit benignly.
arXiv Detail & Related papers (2022-07-14T00:23:01Z) - Benign Overfitting without Linearity: Neural Network Classifiers Trained
by Gradient Descent for Noisy Linear Data [44.431266188350655]
We consider the generalization error of two-layer neural networks trained to generalize by gradient descent.
We show that neural networks exhibit benign overfitting: they can be driven to zero training error, perfectly fitting any noisy training labels, and simultaneously achieve minimax optimal test error.
In contrast to previous work on benign overfitting that require linear or kernel-based predictors, our analysis holds in a setting where both the model and learning dynamics are fundamentally nonlinear.
arXiv Detail & Related papers (2022-02-11T23:04:00Z) - Feature Purification: How Adversarial Training Performs Robust Deep
Learning [66.05472746340142]
We show a principle that we call Feature Purification, where we show one of the causes of the existence of adversarial examples is the accumulation of certain small dense mixtures in the hidden weights during the training process of a neural network.
We present both experiments on the CIFAR-10 dataset to illustrate this principle, and a theoretical result proving that for certain natural classification tasks, training a two-layer neural network with ReLU activation using randomly gradient descent indeed this principle.
arXiv Detail & Related papers (2020-05-20T16:56:08Z) - Compressive sensing with un-trained neural networks: Gradient descent
finds the smoothest approximation [60.80172153614544]
Un-trained convolutional neural networks have emerged as highly successful tools for image recovery and restoration.
We show that an un-trained convolutional neural network can approximately reconstruct signals and images that are sufficiently structured, from a near minimal number of random measurements.
arXiv Detail & Related papers (2020-05-07T15:57:25Z) - A Generalized Neural Tangent Kernel Analysis for Two-layer Neural
Networks [87.23360438947114]
We show that noisy gradient descent with weight decay can still exhibit a " Kernel-like" behavior.
This implies that the training loss converges linearly up to a certain accuracy.
We also establish a novel generalization error bound for two-layer neural networks trained by noisy gradient descent with weight decay.
arXiv Detail & Related papers (2020-02-10T18:56:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.