Benign Overfitting in Two-Layer ReLU Convolutional Neural Networks for
XOR Data
- URL: http://arxiv.org/abs/2310.01975v1
- Date: Tue, 3 Oct 2023 11:31:37 GMT
- Title: Benign Overfitting in Two-Layer ReLU Convolutional Neural Networks for
XOR Data
- Authors: Xuran Meng, Difan Zou, Yuan Cao
- Abstract summary: We show that ReLU CNN trained by gradient descent can achieve near Bayes-optimal accuracy.
Our result demonstrates that CNNs have a remarkable capacity to efficiently learn XOR problems, even in the presence of highly correlated features.
- Score: 24.86314525762012
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Modern deep learning models are usually highly over-parameterized so that
they can overfit the training data. Surprisingly, such overfitting neural
networks can usually still achieve high prediction accuracy. To study this
"benign overfitting" phenomenon, a line of recent works has theoretically
studied the learning of linear models and two-layer neural networks. However,
most of these analyses are still limited to the very simple learning problems
where the Bayes-optimal classifier is linear. In this work, we investigate a
class of XOR-type classification tasks with label-flipping noises. We show
that, under a certain condition on the sample complexity and signal-to-noise
ratio, an over-parameterized ReLU CNN trained by gradient descent can achieve
near Bayes-optimal accuracy. Moreover, we also establish a matching lower bound
result showing that when the previous condition is not satisfied, the
prediction accuracy of the obtained CNN is an absolute constant away from the
Bayes-optimal rate. Our result demonstrates that CNNs have a remarkable
capacity to efficiently learn XOR problems, even in the presence of highly
correlated features.
Related papers
- On the rates of convergence for learning with convolutional neural networks [9.772773527230134]
We study approximation and learning capacities of convolutional neural networks (CNNs) with one-side zero-padding and multiple channels.
We derive convergence rates for estimators based on CNNs in many learning problems.
It is also shown that the obtained rates for classification are minimax optimal in some common settings.
arXiv Detail & Related papers (2024-03-25T06:42:02Z) - Benign Overfitting for Two-layer ReLU Convolutional Neural Networks [60.19739010031304]
We establish algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise.
We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk.
arXiv Detail & Related papers (2023-03-07T18:59:38Z) - Lost Vibration Test Data Recovery Using Convolutional Neural Network: A
Case Study [0.0]
This paper proposes a CNN algorithm for the Alamosa Canyon Bridge as a real structure.
Three different CNN models were considered to predict one and two malfunctioned sensors.
The accuracy of the model was increased by adding a convolutional layer.
arXiv Detail & Related papers (2022-04-11T23:24:03Z) - Do We Really Need a Learnable Classifier at the End of Deep Neural
Network? [118.18554882199676]
We study the potential of learning a neural network for classification with the classifier randomly as an ETF and fixed during training.
Our experimental results show that our method is able to achieve similar performances on image classification for balanced datasets.
arXiv Detail & Related papers (2022-03-17T04:34:28Z) - Benign Overfitting in Two-layer Convolutional Neural Networks [90.75603889605043]
We study the benign overfitting phenomenon in training a two-layer convolutional neural network (CNN)
We show that when the signal-to-noise ratio satisfies a certain condition, a two-layer CNN trained by gradient descent can achieve arbitrarily small training and test loss.
On the other hand, when this condition does not hold, overfitting becomes harmful and the obtained CNN can only achieve constant level test loss.
arXiv Detail & Related papers (2022-02-14T07:45:51Z) - Analytic Learning of Convolutional Neural Network For Pattern
Recognition [20.916630175697065]
Training convolutional neural networks (CNNs) with back-propagation (BP) is time-consuming and resource-intensive.
We propose an analytic convolutional neural network learning (ACnnL)
ACnnL builds a closed-form solution similar to its counterpart, but differs in their regularization constraints.
arXiv Detail & Related papers (2022-02-14T06:32:21Z) - Benign Overfitting without Linearity: Neural Network Classifiers Trained
by Gradient Descent for Noisy Linear Data [44.431266188350655]
We consider the generalization error of two-layer neural networks trained to generalize by gradient descent.
We show that neural networks exhibit benign overfitting: they can be driven to zero training error, perfectly fitting any noisy training labels, and simultaneously achieve minimax optimal test error.
In contrast to previous work on benign overfitting that require linear or kernel-based predictors, our analysis holds in a setting where both the model and learning dynamics are fundamentally nonlinear.
arXiv Detail & Related papers (2022-02-11T23:04:00Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood.
We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks.
Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z) - Approximation and Non-parametric Estimation of ResNet-type Convolutional
Neural Networks [52.972605601174955]
We show a ResNet-type CNN can attain the minimax optimal error rates in important function classes.
We derive approximation and estimation error rates of the aformentioned type of CNNs for the Barron and H"older classes.
arXiv Detail & Related papers (2019-03-24T19:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.