Macro-block dropout for improved regularization in training end-to-end
speech recognition models
- URL: http://arxiv.org/abs/2212.14149v1
- Date: Thu, 29 Dec 2022 02:09:49 GMT
- Title: Macro-block dropout for improved regularization in training end-to-end
speech recognition models
- Authors: Chanwoo Kim, Sathish Indurti, Jinhwan Park, Wonyong Sung
- Abstract summary: We define a macro-block that contains a large number of units from the input to a Recurrent Neural Network (RNN)
Rather than applying dropout to each unit, we apply random dropout to each macro-block.
This algorithm has the effect of applying different drop out rates for each layer even if we keep a constant average dropout rate.
- Score: 26.06529184341722
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper proposes a new regularization algorithm referred to as macro-block
dropout. The overfitting issue has been a difficult problem in training large
neural network models. The dropout technique has proven to be simple yet very
effective for regularization by preventing complex co-adaptations during
training. In our work, we define a macro-block that contains a large number of
units from the input to a Recurrent Neural Network (RNN). Rather than applying
dropout to each unit, we apply random dropout to each macro-block. This
algorithm has the effect of applying different drop out rates for each layer
even if we keep a constant average dropout rate, which has better
regularization effects. In our experiments using Recurrent Neural
Network-Transducer (RNN-T), this algorithm shows relatively 4.30 % and 6.13 %
Word Error Rates (WERs) improvement over the conventional dropout on
LibriSpeech test-clean and test-other. With an Attention-based Encoder-Decoder
(AED) model, this algorithm shows relatively 4.36 % and 5.85 % WERs improvement
over the conventional dropout on the same test sets.
Related papers
- FlexiDrop: Theoretical Insights and Practical Advances in Random Dropout Method on GNNs [4.52430575477004]
We propose a novel random dropout method for Graph Neural Networks (GNNs) called FlexiDrop.
We show that our method enables adaptive adjustment of the dropout rate and theoretically balances the trade-off between model complexity and generalization ability.
arXiv Detail & Related papers (2024-05-30T12:48:44Z) - InRank: Incremental Low-Rank Learning [85.6380047359139]
gradient-based training implicitly regularizes neural networks towards low-rank solutions through a gradual increase of the rank during training.
Existing training algorithms do not exploit the low-rank property to improve computational efficiency.
We design a new training algorithm Incremental Low-Rank Learning (InRank), which explicitly expresses cumulative weight updates as low-rank matrices.
arXiv Detail & Related papers (2023-06-20T03:03:04Z) - The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - Dropout Reduces Underfitting [85.61466286688385]
In this study, we demonstrate that dropout can also mitigate underfitting when used at the start of training.
We find dropout reduces the directional variance of gradients across mini-batches and helps align the mini-batch gradients with the entire dataset's gradient.
Our findings lead us to a solution for improving performance in underfitting models - early dropout: dropout is applied only during the initial phases of training, and turned off afterwards.
arXiv Detail & Related papers (2023-03-02T18:59:15Z) - Distribution Mismatch Correction for Improved Robustness in Deep Neural
Networks [86.42889611784855]
normalization methods increase the vulnerability with respect to noise and input corruptions.
We propose an unsupervised non-parametric distribution correction method that adapts the activation distribution of each layer.
In our experiments, we empirically show that the proposed method effectively reduces the impact of intense image corruptions.
arXiv Detail & Related papers (2021-10-05T11:36:25Z) - AutoDropout: Learning Dropout Patterns to Regularize Deep Networks [82.28118615561912]
Dropout or weight decay methods do not leverage the structures of the network's inputs and hidden states.
We show that this method works well for both image recognition on CIFAR-10 and ImageNet, as well as language modeling on Penn Treebank and WikiText-2.
The learned dropout patterns also transfers to different tasks and datasets, such as from language model on Penn Treebank to Engligh-French translation on WMT 2014.
arXiv Detail & Related papers (2021-01-05T19:54:22Z) - Advanced Dropout: A Model-free Methodology for Bayesian Dropout
Optimization [62.8384110757689]
Overfitting ubiquitously exists in real-world applications of deep neural networks (DNNs)
The advanced dropout technique applies a model-free and easily implemented distribution with parametric prior, and adaptively adjusts dropout rate.
We evaluate the effectiveness of the advanced dropout against nine dropout techniques on seven computer vision datasets.
arXiv Detail & Related papers (2020-10-11T13:19:58Z) - MaxDropout: Deep Neural Network Regularization Based on Maximum Output
Values [0.0]
MaxDropout is a regularizer for deep neural network models that works in a supervised fashion by removing prominent neurons.
We show that it is possible to improve existing neural networks and provide better results in neural networks when Dropout is replaced by MaxDropout.
arXiv Detail & Related papers (2020-07-27T17:55:54Z) - Fast Monte Carlo Dropout and Error Correction for Radio Transmitter
Classification [2.0711789781518752]
Monte Carlo dropout may effectively capture model uncertainty in deep learning, where a measure of uncertainty is obtained by using multiple instances of dropout at test time.
We apply these techniques to the radio frequency (RF) transmitter classification problem and show that the proposed algorithm is able to provide better prediction uncertainty than the simple ensemble average algorithm.
arXiv Detail & Related papers (2020-01-31T17:26:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.