Related papers: The Implicit and Explicit Regularization Effects of Dropout

The Implicit and Explicit Regularization Effects of Dropout

URL: http://arxiv.org/abs/2002.12915v3
Date: Thu, 15 Oct 2020 07:44:22 GMT
Title: The Implicit and Explicit Regularization Effects of Dropout
Authors: Colin Wei, Sham Kakade, Tengyu Ma
Abstract summary: Dropout is a widely-used regularization technique, often required to obtain state-of-the-art for a number of architectures. This work demonstrates that dropout introduces two distinct but entangled regularization effects.
Score: 43.431343291010734
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Dropout is a widely-used regularization technique, often required to obtain state-of-the-art for a number of architectures. This work demonstrates that dropout introduces two distinct but entangled regularization effects: an explicit effect (also studied in prior work) which occurs since dropout modifies the expected training objective, and, perhaps surprisingly, an additional implicit effect from the stochasticity in the dropout training update. This implicit regularization effect is analogous to the effect of stochasticity in small mini-batch stochastic gradient descent. We disentangle these two effects through controlled experiments. We then derive analytic simplifications which characterize each effect in terms of the derivatives of the model and the loss, for deep neural networks. We demonstrate these simplified, analytic regularizers accurately capture the important aspects of dropout, showing they faithfully replace dropout in practice.

Related papers

The Epochal Sawtooth Effect: Unveiling Training Loss Oscillations in Adam and Other Optimizers [8.770864706004472]
We identify and analyze a recurring training loss pattern, which we term the textitEpochal Sawtooth Effect (ESE) This pattern is characterized by a sharp drop in loss at the beginning of each epoch, followed by a gradual increase, resulting in a sawtooth-shaped loss curve. We provide an in-depth explanation of the underlying mechanisms that lead to the Epochal Sawtooth Effect.
arXiv Detail & Related papers (2024-10-14T00:51:21Z)
Data Attribution for Diffusion Models: Timestep-induced Bias in Influence Estimation [53.27596811146316]
Diffusion models operate over a sequence of timesteps instead of instantaneous input-output relationships in previous contexts. We present Diffusion-TracIn that incorporates this temporal dynamics and observe that samples' loss gradient norms are highly dependent on timestep. We introduce Diffusion-ReTrac as a re-normalized adaptation that enables the retrieval of training samples more targeted to the test sample of interest.
arXiv Detail & Related papers (2024-01-17T07:58:18Z)
On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics. The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z)
Stochastic Modified Equations and Dynamics of Dropout Algorithm [4.811269936680572]
Dropout is a widely utilized regularization technique in the training of neural networks. Its underlying mechanism and its impact on achieving good abilities remain poorly understood.
arXiv Detail & Related papers (2023-05-25T08:42:25Z)
Dropout Reduces Underfitting [85.61466286688385]
In this study, we demonstrate that dropout can also mitigate underfitting when used at the start of training. We find dropout reduces the directional variance of gradients across mini-batches and helps align the mini-batch gradients with the entire dataset's gradient. Our findings lead us to a solution for improving performance in underfitting models - early dropout: dropout is applied only during the initial phases of training, and turned off afterwards.
arXiv Detail & Related papers (2023-03-02T18:59:15Z)
Theoretical Characterization of How Neural Network Pruning Affects its Generalization [131.1347309639727]
This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization. It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero. More surprisingly, the generalization bound gets better as the pruning fraction gets larger.
arXiv Detail & Related papers (2023-01-01T03:10:45Z)
Implicit regularization of dropout [3.42658286826597]
It is important to understand how dropout, a popular regularization method, aids in achieving a good generalization solution during neural network training. In this work, we present a theoretical derivation of an implicit regularization of dropout, which is validated by a series of experiments. We experimentally find that the training with dropout leads to the neural network with a flatter minimum compared with standard gradient descent training.
arXiv Detail & Related papers (2022-07-13T04:09:14Z)
DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization [125.5448293005647]
We discuss how the implicit regularization effect of SGD seen in supervised learning could in fact be harmful in the offline deep RL. Our theoretical analysis shows that when existing models of implicit regularization are applied to temporal difference learning, the resulting derived regularizer favors degenerate solutions. We propose a simple and effective explicit regularizer, called DR3, that counteracts the undesirable effects of this implicit regularizer.
arXiv Detail & Related papers (2021-12-09T06:01:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.