Dropout Reduces Underfitting
- URL: http://arxiv.org/abs/2303.01500v2
- Date: Wed, 31 May 2023 17:47:18 GMT
- Title: Dropout Reduces Underfitting
- Authors: Zhuang Liu, Zhiqiu Xu, Joseph Jin, Zhiqiang Shen, Trevor Darrell
- Abstract summary: In this study, we demonstrate that dropout can also mitigate underfitting when used at the start of training.
We find dropout reduces the directional variance of gradients across mini-batches and helps align the mini-batch gradients with the entire dataset's gradient.
Our findings lead us to a solution for improving performance in underfitting models - early dropout: dropout is applied only during the initial phases of training, and turned off afterwards.
- Score: 85.61466286688385
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Introduced by Hinton et al. in 2012, dropout has stood the test of time as a
regularizer for preventing overfitting in neural networks. In this study, we
demonstrate that dropout can also mitigate underfitting when used at the start
of training. During the early phase, we find dropout reduces the directional
variance of gradients across mini-batches and helps align the mini-batch
gradients with the entire dataset's gradient. This helps counteract the
stochasticity of SGD and limit the influence of individual batches on model
training. Our findings lead us to a solution for improving performance in
underfitting models - early dropout: dropout is applied only during the initial
phases of training, and turned off afterwards. Models equipped with early
dropout achieve lower final training loss compared to their counterparts
without dropout. Additionally, we explore a symmetric technique for
regularizing overfitting models - late dropout, where dropout is not used in
the early iterations and is only activated later in training. Experiments on
ImageNet and various vision tasks demonstrate that our methods consistently
improve generalization accuracy. Our results encourage more research on
understanding regularization in deep learning and our methods can be useful
tools for future neural network training, especially in the era of large data.
Code is available at https://github.com/facebookresearch/dropout.
Related papers
- A Negative Result on Gradient Matching for Selective Backprop [8.463693396893731]
Training deep neural networks becomes a massive computational burden.
One approach to speed up the training process is Selective Backprop.
We build on this approach by choosing the (weighted) subset which best matches the mean gradient over the entire minibatch.
We find that both the loss-based as well as the gradient-matching strategy fail to consistently outperform the random baseline.
arXiv Detail & Related papers (2023-12-08T13:03:10Z) - Relearning Forgotten Knowledge: on Forgetting, Overfit and Training-Free
Ensembles of DNNs [9.010643838773477]
We introduce a novel score for quantifying overfit, which monitors the forgetting rate of deep models on validation data.
We show that overfit can occur with and without a decrease in validation accuracy, and may be more common than previously appreciated.
We use our observations to construct a new ensemble method, based solely on the training history of a single network, which provides significant improvement without any additional cost in training time.
arXiv Detail & Related papers (2023-10-17T09:22:22Z) - Slimmable Networks for Contrastive Self-supervised Learning [69.9454691873866]
Self-supervised learning makes significant progress in pre-training large models, but struggles with small models.
We introduce another one-stage solution to obtain pre-trained small models without the need for extra teachers.
A slimmable network consists of a full network and several weight-sharing sub-networks, which can be pre-trained once to obtain various networks.
arXiv Detail & Related papers (2022-09-30T15:15:05Z) - Implicit regularization of dropout [3.42658286826597]
It is important to understand how dropout, a popular regularization method, aids in achieving a good generalization solution during neural network training.
In this work, we present a theoretical derivation of an implicit regularization of dropout, which is validated by a series of experiments.
We experimentally find that the training with dropout leads to the neural network with a flatter minimum compared with standard gradient descent training.
arXiv Detail & Related papers (2022-07-13T04:09:14Z) - Neuron-Specific Dropout: A Deterministic Regularization Technique to
Prevent Neural Networks from Overfitting & Reduce Dependence on Large
Training Samples [0.0]
NSDropout looks at both the training pass, and validation pass, of a layer in a model.
By comparing the average values produced by each neuron for each class in a data set, the network is able to drop targeted units.
The layer is able to predict what features, or noise, the model is looking at during testing that isn't present when looking at samples from validation.
arXiv Detail & Related papers (2022-01-13T13:10:30Z) - Mixing between the Cross Entropy and the Expectation Loss Terms [89.30385901335323]
Cross entropy loss tends to focus on hard to classify samples during training.
We show that adding to the optimization goal the expectation loss helps the network to achieve better accuracy.
Our experiments show that the new training protocol improves performance across a diverse set of classification domains.
arXiv Detail & Related papers (2021-09-12T23:14:06Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z) - Advanced Dropout: A Model-free Methodology for Bayesian Dropout
Optimization [62.8384110757689]
Overfitting ubiquitously exists in real-world applications of deep neural networks (DNNs)
The advanced dropout technique applies a model-free and easily implemented distribution with parametric prior, and adaptively adjusts dropout rate.
We evaluate the effectiveness of the advanced dropout against nine dropout techniques on seven computer vision datasets.
arXiv Detail & Related papers (2020-10-11T13:19:58Z) - Do We Need Zero Training Loss After Achieving Zero Training Error? [76.44358201918156]
We propose a direct solution called emphflooding that intentionally prevents further reduction of the training loss when it reaches a reasonably small value.
We experimentally show that flooding improves performance and, as a byproduct, induces a double descent curve of the test loss.
arXiv Detail & Related papers (2020-02-20T12:50:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.