Revisiting Structured Dropout
- URL: http://arxiv.org/abs/2210.02570v1
- Date: Wed, 5 Oct 2022 21:26:57 GMT
- Title: Revisiting Structured Dropout
- Authors: Yiren Zhao, Oluwatomisin Dada, Xitong Gao, Robert D Mullins
- Abstract summary: textbfemphProbDropBlock drops contiguous blocks from feature maps with a probability given by the normalized feature salience values.
We find that with a simple scheduling strategy the proposed approach to structured Dropout consistently improved model performance compared to baselines.
- Score: 11.011268090482577
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large neural networks are often overparameterised and prone to overfitting,
Dropout is a widely used regularization technique to combat overfitting and
improve model generalization. However, unstructured Dropout is not always
effective for specific network architectures and this has led to the formation
of multiple structured Dropout approaches to improve model performance and,
sometimes, reduce the computational resources required for inference. In this
work, we revisit structured Dropout comparing different Dropout approaches to
natural language processing and computer vision tasks for multiple
state-of-the-art networks. Additionally, we devise an approach to structured
Dropout we call \textbf{\emph{ProbDropBlock}} which drops contiguous blocks
from feature maps with a probability given by the normalized feature salience
values. We find that with a simple scheduling strategy the proposed approach to
structured Dropout consistently improved model performance compared to
baselines and other Dropout approaches on a diverse range of tasks and models.
In particular, we show \textbf{\emph{ProbDropBlock}} improves RoBERTa
finetuning on MNLI by $0.22\%$, and training of ResNet50 on ImageNet by
$0.28\%$.
Related papers
- Lightweight Diffusion Models with Distillation-Based Block Neural
Architecture Search [55.41583104734349]
We propose to automatically remove structural redundancy in diffusion models with our proposed Diffusion Distillation-based Block-wise Neural Architecture Search (NAS)
Given a larger pretrained teacher, we leverage DiffNAS to search for the smallest architecture which can achieve on-par or even better performance than the teacher.
Different from previous block-wise NAS methods, DiffNAS contains a block-wise local search strategy and a retraining strategy with a joint dynamic loss.
arXiv Detail & Related papers (2023-11-08T12:56:59Z) - R-Block: Regularized Block of Dropout for convolutional networks [0.0]
Dropout as a regularization technique is widely used in fully connected layers while is less effective in convolutional layers.
In this paper, we apply a mutual learning training strategy for convolutional layer regularization, namely R-Block.
We show that R-Block achieves better performance than other existing structured dropout variants.
arXiv Detail & Related papers (2023-07-27T18:53:14Z) - R-Drop: Regularized Dropout for Neural Networks [99.42791938544012]
Dropout is a powerful and widely used technique to regularize the training of deep neural networks.
We introduce a simple regularization strategy upon dropout in model training, namely R-Drop, which forces the output distributions of different sub models to be consistent with each other.
arXiv Detail & Related papers (2021-06-28T08:01:26Z) - UniDrop: A Simple yet Effective Technique to Improve Transformer without
Extra Cost [110.67392881417777]
Transformer architecture achieves great success in abundant natural language processing tasks.
We find simple techniques such as dropout, can greatly boost model performance with a careful design.
Specifically, we propose an approach named UniDrop to unites three different dropout techniques.
arXiv Detail & Related papers (2021-04-11T07:43:19Z) - Contextual Dropout: An Efficient Sample-Dependent Dropout Module [60.63525456640462]
Dropout has been demonstrated as a simple and effective module to regularize the training process of deep neural networks.
We propose contextual dropout with an efficient structural design as a simple and scalable sample-dependent dropout module.
Our experimental results show that the proposed method outperforms baseline methods in terms of both accuracy and quality of uncertainty estimation.
arXiv Detail & Related papers (2021-03-06T19:30:32Z) - AutoDropout: Learning Dropout Patterns to Regularize Deep Networks [82.28118615561912]
Dropout or weight decay methods do not leverage the structures of the network's inputs and hidden states.
We show that this method works well for both image recognition on CIFAR-10 and ImageNet, as well as language modeling on Penn Treebank and WikiText-2.
The learned dropout patterns also transfers to different tasks and datasets, such as from language model on Penn Treebank to Engligh-French translation on WMT 2014.
arXiv Detail & Related papers (2021-01-05T19:54:22Z) - Advanced Dropout: A Model-free Methodology for Bayesian Dropout
Optimization [62.8384110757689]
Overfitting ubiquitously exists in real-world applications of deep neural networks (DNNs)
The advanced dropout technique applies a model-free and easily implemented distribution with parametric prior, and adaptively adjusts dropout rate.
We evaluate the effectiveness of the advanced dropout against nine dropout techniques on seven computer vision datasets.
arXiv Detail & Related papers (2020-10-11T13:19:58Z) - DropCluster: A structured dropout for convolutional networks [0.7489179288638513]
Dropout as a regularizer in deep neural networks has been less effective in convolutional layers than in fully connected layers.
We introduce a novel structured regularization for convolutional layers, which we call DropCluster.
Our approach achieves better performance than DropBlock or other existing structured dropout variants.
arXiv Detail & Related papers (2020-02-07T20:02:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.