Benign Overfitting for Two-layer ReLU Convolutional Neural Networks
- URL: http://arxiv.org/abs/2303.04145v2
- Date: Sat, 4 Nov 2023 01:46:47 GMT
- Title: Benign Overfitting for Two-layer ReLU Convolutional Neural Networks
- Authors: Yiwen Kou and Zixiang Chen and Yuanzhou Chen and Quanquan Gu
- Abstract summary: We establish algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise.
We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk.
- Score: 60.19739010031304
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern deep learning models with great expressive power can be trained to
overfit the training data but still generalize well. This phenomenon is
referred to as \textit{benign overfitting}. Recently, a few studies have
attempted to theoretically understand benign overfitting in neural networks.
However, these works are either limited to neural networks with smooth
activation functions or to the neural tangent kernel regime. How and when
benign overfitting can occur in ReLU neural networks remains an open problem.
In this work, we seek to answer this question by establishing
algorithm-dependent risk bounds for learning two-layer ReLU convolutional
neural networks with label-flipping noise. We show that, under mild conditions,
the neural network trained by gradient descent can achieve near-zero training
loss and Bayes optimal test risk. Our result also reveals a sharp transition
between benign and harmful overfitting under different conditions on data
distribution in terms of test risk. Experiments on synthetic data back up our
theory.
Related papers
- Implicit Bias of Gradient Descent for Two-layer ReLU and Leaky ReLU
Networks on Nearly-orthogonal Data [66.1211659120882]
The implicit bias towards solutions with favorable properties is believed to be a key reason why neural networks trained by gradient-based optimization can generalize well.
While the implicit bias of gradient flow has been widely studied for homogeneous neural networks (including ReLU and leaky ReLU networks), the implicit bias of gradient descent is currently only understood for smooth neural networks.
arXiv Detail & Related papers (2023-10-29T08:47:48Z) - Benign, Tempered, or Catastrophic: A Taxonomy of Overfitting [19.08269066145619]
Some interpolating methods, including neural networks, can fit noisy training data without catastrophically bad test performance.
We argue that real interpolating methods like neural networks do not fit benignly.
arXiv Detail & Related papers (2022-07-14T00:23:01Z) - Optimal Learning Rates of Deep Convolutional Neural Networks: Additive
Ridge Functions [19.762318115851617]
We consider the mean squared error analysis for deep convolutional neural networks.
We show that, for additive ridge functions, convolutional neural networks followed by one fully connected layer with ReLU activation functions can reach optimal mini-max rates.
arXiv Detail & Related papers (2022-02-24T14:22:32Z) - Benign Overfitting in Two-layer Convolutional Neural Networks [90.75603889605043]
We study the benign overfitting phenomenon in training a two-layer convolutional neural network (CNN)
We show that when the signal-to-noise ratio satisfies a certain condition, a two-layer CNN trained by gradient descent can achieve arbitrarily small training and test loss.
On the other hand, when this condition does not hold, overfitting becomes harmful and the obtained CNN can only achieve constant level test loss.
arXiv Detail & Related papers (2022-02-14T07:45:51Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Artificial Neural Variability for Deep Learning: On Overfitting, Noise
Memorization, and Catastrophic Forgetting [135.0863818867184]
artificial neural variability (ANV) helps artificial neural networks learn some advantages from natural'' neural networks.
ANV plays as an implicit regularizer of the mutual information between the training data and the learned model.
It can effectively relieve overfitting, label noise memorization, and catastrophic forgetting at negligible costs.
arXiv Detail & Related papers (2020-11-12T06:06:33Z) - Towards Understanding Hierarchical Learning: Benefits of Neural
Representations [160.33479656108926]
In this work, we demonstrate that intermediate neural representations add more flexibility to neural networks.
We show that neural representation can achieve improved sample complexities compared with the raw input.
Our results characterize when neural representations are beneficial, and may provide a new perspective on why depth is important in deep learning.
arXiv Detail & Related papers (2020-06-24T02:44:54Z) - Bidirectionally Self-Normalizing Neural Networks [46.20979546004718]
We provide a rigorous result that shows, under mild conditions, how the vanishing/exploding gradients problem disappears with high probability if the neural networks have sufficient width.
Our main idea is to constrain both forward and backward signal propagation in a nonlinear neural network through a new class of activation functions.
arXiv Detail & Related papers (2020-06-22T12:07:29Z) - A Deep Conditioning Treatment of Neural Networks [37.192369308257504]
We show that depth improves trainability of neural networks by improving the conditioning of certain kernel matrices of the input data.
We provide versions of the result that hold for training just the top layer of the neural network, as well as for training all layers via the neural tangent kernel.
arXiv Detail & Related papers (2020-02-04T20:21:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.