Dropout Drops Double Descent
- URL: http://arxiv.org/abs/2305.16179v3
- Date: Sun, 11 Feb 2024 09:35:03 GMT
- Title: Dropout Drops Double Descent
- Authors: Tian-Le Yang, Joe Suzuki
- Abstract summary: This study demonstrates that double descent can be mitigated by adding a dropout layer adjacent to the fully connected linear layer.
Our paper posits that the optimal test error, in terms of the dropout rate, shows a monotonic decrease in linear regression with increasing sample size.
- Score: 1.0878040851637998
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This study demonstrates that double descent can be mitigated by adding a
dropout layer adjacent to the fully connected linear layer. The unexpected
double-descent phenomenon garnered substantial attention in recent years,
resulting in fluctuating prediction error rates as either sample size or model
size increases. Our paper posits that the optimal test error, in terms of the
dropout rate, shows a monotonic decrease in linear regression with increasing
sample size. Although we do not provide a precise mathematical proof of this
statement, we empirically validate through experiments that the test error
decreases for each dropout rate. The statement we prove is that the expected
test error for each dropout rate within a certain range decreases when the
dropout rate is fixed. Our experimental results substantiate our claim, showing
that dropout with an optimal dropout rate can yield a monotonic test error
curve in nonlinear neural networks. These experiments were conducted using the
Fashion-MNIST and CIFAR-10 datasets. These findings imply the potential benefit
of incorporating dropout into risk curve scaling to address the peak
phenomenon. To our knowledge, this study represents the first investigation
into the relationship between dropout and double descent.
Related papers
- A U-turn on Double Descent: Rethinking Parameter Counting in Statistical
Learning [68.76846801719095]
We show that double descent appears exactly when and where it occurs, and that its location is not inherently tied to the threshold p=n.
This provides a resolution to tensions between double descent and statistical intuition.
arXiv Detail & Related papers (2023-10-29T12:05:39Z) - Dropout Regularization Versus $\ell_2$-Penalization in the Linear Model [7.032245866317619]
We study the statistical behavior of gradient descent iterates with dropout in the linear regression model.
We indicate a more subtle relationship, owing to interactions between the gradient descent dynamics and the additional randomness induced by dropout.
arXiv Detail & Related papers (2023-06-18T11:17:15Z) - Input Perturbation Reduces Exposure Bias in Diffusion Models [41.483581603727444]
We show that a long sampling chain leads to an error accumulation phenomenon, similar to the exposure bias problem in autoregressive text generation.
We propose a very simple but effective training regularization, consisting in perturbing the ground truth samples to simulate the inference time prediction errors.
We empirically show that, without affecting the recall and precision, the proposed input perturbation leads to a significant improvement in the sample quality.
arXiv Detail & Related papers (2023-01-27T13:34:54Z) - Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
We study the phenomenon of "double descent" of the generalization error.
We find that double descent can be attributed to distinct features being learned at different scales.
arXiv Detail & Related papers (2021-12-06T18:17:08Z) - Optimization Variance: Exploring Generalization Properties of DNNs [83.78477167211315]
The test error of a deep neural network (DNN) often demonstrates double descent.
We propose a novel metric, optimization variance (OV), to measure the diversity of model updates.
arXiv Detail & Related papers (2021-06-03T09:34:17Z) - Efficient Causal Inference from Combined Observational and
Interventional Data through Causal Reductions [68.6505592770171]
Unobserved confounding is one of the main challenges when estimating causal effects.
We propose a novel causal reduction method that replaces an arbitrary number of possibly high-dimensional latent confounders.
We propose a learning algorithm to estimate the parameterized reduced model jointly from observational and interventional data.
arXiv Detail & Related papers (2021-03-08T14:29:07Z) - Contextual Dropout: An Efficient Sample-Dependent Dropout Module [60.63525456640462]
Dropout has been demonstrated as a simple and effective module to regularize the training process of deep neural networks.
We propose contextual dropout with an efficient structural design as a simple and scalable sample-dependent dropout module.
Our experimental results show that the proposed method outperforms baseline methods in terms of both accuracy and quality of uncertainty estimation.
arXiv Detail & Related papers (2021-03-06T19:30:32Z) - Advanced Dropout: A Model-free Methodology for Bayesian Dropout
Optimization [62.8384110757689]
Overfitting ubiquitously exists in real-world applications of deep neural networks (DNNs)
The advanced dropout technique applies a model-free and easily implemented distribution with parametric prior, and adaptively adjusts dropout rate.
We evaluate the effectiveness of the advanced dropout against nine dropout techniques on seven computer vision datasets.
arXiv Detail & Related papers (2020-10-11T13:19:58Z) - Double Trouble in Double Descent : Bias and Variance(s) in the Lazy
Regime [32.65347128465841]
Deep neural networks can achieve remarkable performances while interpolating the training data perfectly.
Rather than the U-curve of the bias-variance trade-off, their test error often follows a "double descent"
We develop a quantitative theory for this phenomenon in the so-called lazy learning regime of neural networks.
arXiv Detail & Related papers (2020-03-02T17:39:31Z) - The Implicit and Explicit Regularization Effects of Dropout [43.431343291010734]
Dropout is a widely-used regularization technique, often required to obtain state-of-the-art for a number of architectures.
This work demonstrates that dropout introduces two distinct but entangled regularization effects.
arXiv Detail & Related papers (2020-02-28T18:31:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.