When do Convolutional Neural Networks Stop Learning?
- URL: http://arxiv.org/abs/2403.02473v1
- Date: Mon, 4 Mar 2024 20:35:09 GMT
- Title: When do Convolutional Neural Networks Stop Learning?
- Authors: Sahan Ahmad, Gabriel Trahan, Aminul Islam
- Abstract summary: Convolutional Neural Networks (CNNs) have demonstrated outstanding performance in computer vision tasks.
Current practice is to stop training when the training loss decreases and the gap between training and validation error increases.
This research work introduces a hypothesis that analyzes the data variation across all the layers of a CNN variant to anticipate its near-optimal learning capacity.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Convolutional Neural Networks (CNNs) have demonstrated outstanding
performance in computer vision tasks such as image classification, detection,
segmentation, and medical image analysis. In general, an arbitrary number of
epochs is used to train such neural networks. In a single epoch, the entire
training data -- divided by batch size -- are fed to the network. In practice,
validation error with training loss is used to estimate the neural network's
generalization, which indicates the optimal learning capacity of the network.
Current practice is to stop training when the training loss decreases and the
gap between training and validation error increases (i.e., the generalization
gap) to avoid overfitting. However, this is a trial-and-error-based approach
which raises a critical question: Is it possible to estimate when neural
networks stop learning based on training data? This research work introduces a
hypothesis that analyzes the data variation across all the layers of a CNN
variant to anticipate its near-optimal learning capacity. In the training
phase, we use our hypothesis to anticipate the near-optimal learning capacity
of a CNN variant without using any validation data. Our hypothesis can be
deployed as a plug-and-play to any existing CNN variant without introducing
additional trainable parameters to the network. We test our hypothesis on six
different CNN variants and three different general image datasets (CIFAR10,
CIFAR100, and SVHN). The result based on these CNN variants and datasets shows
that our hypothesis saves 58.49\% of computational time (on average) in
training. We further conduct our hypothesis on ten medical image datasets and
compared with the MedMNIST-V2 benchmark. Based on our experimental result, we
save $\approx$ 44.1\% of computational time without losing accuracy against the
MedMNIST-V2 benchmark.
Related papers
- Deep Neural Networks Tend To Extrapolate Predictably [51.303814412294514]
neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs.
We observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD.
We show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.
arXiv Detail & Related papers (2023-10-02T03:25:32Z) - Benign Overfitting for Two-layer ReLU Convolutional Neural Networks [60.19739010031304]
We establish algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise.
We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk.
arXiv Detail & Related papers (2023-03-07T18:59:38Z) - DCLP: Neural Architecture Predictor with Curriculum Contrastive Learning [5.2319020651074215]
We propose a Curricumum-guided Contrastive Learning framework for neural Predictor (DCLP)
Our method simplifies the contrastive task by designing a novel curriculum to enhance the stability of unlabeled training data distribution.
We experimentally demonstrate that DCLP has high accuracy and efficiency compared with existing predictors.
arXiv Detail & Related papers (2023-02-25T08:16:21Z) - Boosted Dynamic Neural Networks [53.559833501288146]
A typical EDNN has multiple prediction heads at different layers of the network backbone.
To optimize the model, these prediction heads together with the network backbone are trained on every batch of training data.
Treating training and testing inputs differently at the two phases will cause the mismatch between training and testing data distributions.
We formulate an EDNN as an additive model inspired by gradient boosting, and propose multiple training techniques to optimize the model effectively.
arXiv Detail & Related papers (2022-11-30T04:23:12Z) - Reconstructing Training Data from Trained Neural Networks [42.60217236418818]
We show in some cases a significant fraction of the training data can in fact be reconstructed from the parameters of a trained neural network classifier.
We propose a novel reconstruction scheme that stems from recent theoretical results about the implicit bias in training neural networks with gradient-based methods.
arXiv Detail & Related papers (2022-06-15T18:35:16Z) - Lost Vibration Test Data Recovery Using Convolutional Neural Network: A
Case Study [0.0]
This paper proposes a CNN algorithm for the Alamosa Canyon Bridge as a real structure.
Three different CNN models were considered to predict one and two malfunctioned sensors.
The accuracy of the model was increased by adding a convolutional layer.
arXiv Detail & Related papers (2022-04-11T23:24:03Z) - Benign Overfitting in Two-layer Convolutional Neural Networks [90.75603889605043]
We study the benign overfitting phenomenon in training a two-layer convolutional neural network (CNN)
We show that when the signal-to-noise ratio satisfies a certain condition, a two-layer CNN trained by gradient descent can achieve arbitrarily small training and test loss.
On the other hand, when this condition does not hold, overfitting becomes harmful and the obtained CNN can only achieve constant level test loss.
arXiv Detail & Related papers (2022-02-14T07:45:51Z) - Neuron-Specific Dropout: A Deterministic Regularization Technique to
Prevent Neural Networks from Overfitting & Reduce Dependence on Large
Training Samples [0.0]
NSDropout looks at both the training pass, and validation pass, of a layer in a model.
By comparing the average values produced by each neuron for each class in a data set, the network is able to drop targeted units.
The layer is able to predict what features, or noise, the model is looking at during testing that isn't present when looking at samples from validation.
arXiv Detail & Related papers (2022-01-13T13:10:30Z) - BreakingBED -- Breaking Binary and Efficient Deep Neural Networks by
Adversarial Attacks [65.2021953284622]
We study robustness of CNNs against white-box and black-box adversarial attacks.
Results are shown for distilled CNNs, agent-based state-of-the-art pruned models, and binarized neural networks.
arXiv Detail & Related papers (2021-03-14T20:43:19Z) - Predicting Neural Network Accuracy from Weights [25.73213712719546]
We show experimentally that the accuracy of a trained neural network can be predicted surprisingly well by looking only at its weights.
We release a collection of 120k convolutional neural networks trained on four different datasets to encourage further research in this area.
arXiv Detail & Related papers (2020-02-26T13:06:14Z) - Approximation and Non-parametric Estimation of ResNet-type Convolutional
Neural Networks [52.972605601174955]
We show a ResNet-type CNN can attain the minimax optimal error rates in important function classes.
We derive approximation and estimation error rates of the aformentioned type of CNNs for the Barron and H"older classes.
arXiv Detail & Related papers (2019-03-24T19:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.