Related papers: Dropout Drops Double Descent

Dropout Drops Double Descent

URL: http://arxiv.org/abs/2305.16179v3
Date: Sun, 11 Feb 2024 09:35:03 GMT
Title: Dropout Drops Double Descent
Authors: Tian-Le Yang, Joe Suzuki
Abstract summary: This study demonstrates that double descent can be mitigated by adding a dropout layer adjacent to the fully connected linear layer. Our paper posits that the optimal test error, in terms of the dropout rate, shows a monotonic decrease in linear regression with increasing sample size.
Score: 1.0878040851637998
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This study demonstrates that double descent can be mitigated by adding a dropout layer adjacent to the fully connected linear layer. The unexpected double-descent phenomenon garnered substantial attention in recent years, resulting in fluctuating prediction error rates as either sample size or model size increases. Our paper posits that the optimal test error, in terms of the dropout rate, shows a monotonic decrease in linear regression with increasing sample size. Although we do not provide a precise mathematical proof of this statement, we empirically validate through experiments that the test error decreases for each dropout rate. The statement we prove is that the expected test error for each dropout rate within a certain range decreases when the dropout rate is fixed. Our experimental results substantiate our claim, showing that dropout with an optimal dropout rate can yield a monotonic test error curve in nonlinear neural networks. These experiments were conducted using the Fashion-MNIST and CIFAR-10 datasets. These findings imply the potential benefit of incorporating dropout into risk curve scaling to address the peak phenomenon. To our knowledge, this study represents the first investigation into the relationship between dropout and double descent.

Related papers

Analytic theory of dropout regularization [1.243080988483032]
Dropout is a regularization technique widely used in training artificial neural networks.<n>We analytically study dropout in two-layer neural networks trained with online gradient descent.
arXiv Detail & Related papers (2025-05-12T17:45:02Z)
Causal Lifting of Neural Representations: Zero-Shot Generalization for Causal Inferences [56.23412698865433]
We focus on causal inferences on a target experiment with unlabeled factual outcomes, retrieved by a predictive model fine-tuned on a labeled similar experiment. First, we show that factual outcome estimation via Empirical Risk Minimization (ERM) may fail to yield valid causal inferences on the target population. We propose Deconfounded Empirical Risk Minimization (DERM), a new simple learning procedure minimizing the risk over a fictitious target population.
arXiv Detail & Related papers (2025-02-10T10:52:17Z)
A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning [68.76846801719095]
We show that double descent appears exactly when and where it occurs, and that its location is not inherently tied to the threshold p=n. This provides a resolution to tensions between double descent and statistical intuition.
arXiv Detail & Related papers (2023-10-29T12:05:39Z)
Dropout Regularization Versus $\ell_2$-Penalization in the Linear Model [7.032245866317619]
We study the statistical behavior of gradient descent iterates with dropout in the linear regression model. We indicate a more subtle relationship, owing to interactions between the gradient descent dynamics and the additional randomness induced by dropout.
arXiv Detail & Related papers (2023-06-18T11:17:15Z)
Input Perturbation Reduces Exposure Bias in Diffusion Models [41.483581603727444]
We show that a long sampling chain leads to an error accumulation phenomenon, similar to the exposure bias problem in autoregressive text generation. We propose a very simple but effective training regularization, consisting in perturbing the ground truth samples to simulate the inference time prediction errors. We empirically show that, without affecting the recall and precision, the proposed input perturbation leads to a significant improvement in the sample quality.
arXiv Detail & Related papers (2023-01-27T13:34:54Z)
Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
We study the phenomenon of "double descent" of the generalization error. We find that double descent can be attributed to distinct features being learned at different scales.
arXiv Detail & Related papers (2021-12-06T18:17:08Z)
Optimization Variance: Exploring Generalization Properties of DNNs [83.78477167211315]
The test error of a deep neural network (DNN) often demonstrates double descent. We propose a novel metric, optimization variance (OV), to measure the diversity of model updates.
arXiv Detail & Related papers (2021-06-03T09:34:17Z)
Efficient Causal Inference from Combined Observational and Interventional Data through Causal Reductions [68.6505592770171]
Unobserved confounding is one of the main challenges when estimating causal effects. We propose a novel causal reduction method that replaces an arbitrary number of possibly high-dimensional latent confounders. We propose a learning algorithm to estimate the parameterized reduced model jointly from observational and interventional data.
arXiv Detail & Related papers (2021-03-08T14:29:07Z)
Contextual Dropout: An Efficient Sample-Dependent Dropout Module [60.63525456640462]
Dropout has been demonstrated as a simple and effective module to regularize the training process of deep neural networks. We propose contextual dropout with an efficient structural design as a simple and scalable sample-dependent dropout module. Our experimental results show that the proposed method outperforms baseline methods in terms of both accuracy and quality of uncertainty estimation.
arXiv Detail & Related papers (2021-03-06T19:30:32Z)
Advanced Dropout: A Model-free Methodology for Bayesian Dropout Optimization [62.8384110757689]
Overfitting ubiquitously exists in real-world applications of deep neural networks (DNNs) The advanced dropout technique applies a model-free and easily implemented distribution with parametric prior, and adaptively adjusts dropout rate. We evaluate the effectiveness of the advanced dropout against nine dropout techniques on seven computer vision datasets.
arXiv Detail & Related papers (2020-10-11T13:19:58Z)
Double Trouble in Double Descent : Bias and Variance(s) in the Lazy Regime [32.65347128465841]
Deep neural networks can achieve remarkable performances while interpolating the training data perfectly. Rather than the U-curve of the bias-variance trade-off, their test error often follows a "double descent" We develop a quantitative theory for this phenomenon in the so-called lazy learning regime of neural networks.
arXiv Detail & Related papers (2020-03-02T17:39:31Z)
The Implicit and Explicit Regularization Effects of Dropout [43.431343291010734]
Dropout is a widely-used regularization technique, often required to obtain state-of-the-art for a number of architectures. This work demonstrates that dropout introduces two distinct but entangled regularization effects.
arXiv Detail & Related papers (2020-02-28T18:31:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.